CSDL-T-1276 

CACHE  ANALYSIS  IN  A 
MULTIPROCESS  ENVIRONMENT  USING 
EXECUTION  DRIVEN  SIMULATION 

by 

John  Hamilton  Fraser  III 
August  1996 


Master  of  Science  Thesis 
Northeastern  University 


The  Charles  Stark  Draper  Laboratory,  Inc. 

555  Technology  Square,  Cambridge,  Massachusetts  02139-3563 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
0MB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and 
reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for 
Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  222024302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  DC  20503. 


1.  AGENCY  USE  ONLY  (Leave blank} 


2.  REPORT  DATE 

9  Jan  97 


3.  REPORT  TYPE  AND  DATES  COVERED 


4.  TITLE  AND  SUBTITLE 

Cache  Analysis  In  A  Multiprocess  Environment  Using  Execution  Driven  Simulation 

5.  FUNDING  NUMBERS 

6.  AUTHOR(S) 

John  Hamilton  Fraser  IQ 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES} 

Northeastern  University 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

96-121 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

DEPARTMENT  OF  THE  AIR  FORCE 

AFIT/CI 

2950  P  STREET 

WPAFB  OH  45433-7765 

10.  SPONSORING/MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTION  AVAILABILITY  STATEMENT 

Unlimited 

12b.  DISTRIBUTION  CODE 

17.  SECURITY  CLASSIFICATION 
OF  REPORT 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 


19.  SECURITY  CLASSIFICATION 
OF  ABSTRACT 


|20.  LIMITATION  OF  ABSTRACT 


Standard  Form  298  (Rev.  2-89  EG 

Prescribed  by  ANSI  Std.  239.18 

Designed  using  Perform  Pro,  WHS/DIOR,  Oct  94 


Cache  Analysis  in  a  Multiprocess 
Environment  Using  Execution  Driven 

Simulation 


A  Thesis  Presented  By 


John  Hamilton  Fraser  III 


to 

The  Department  of  Electrical  Engineering 

in  partial  fulfillment  of  the  requirements 
for  the  degree  of 

Master  of  Science 

in  the  field  of 

Electrical  Engineering 
(Computer  Engineering  Concentration) 


Northeastern  University 
Boston,  Massachusetts 


August  30,  1996 


Abstract 


Cache  memory  is  commonly  used  to  bridge  the  gap  between  microprocessor  and  memory 
speeds.  A  wide  variety  of  cache  designs  are  possible,  so  some  method  is  required  to  evaluate  the 
benefits  and  costs  of  the  various  alternatives.  Trace  driven  simulation  is  commonly  used  by  the 
computer  architecture  community  to  analyze  potential  designs.  Traces  of  benchmark  execution  are 
applied  to  a  model  of  the  design  under  study.  Most  of  today’s  computer  systems  have  been  optimized 
based  on  results  of  these  studies. 

One  important  aspect  that  is  frequently  ignored  in  trace  driven  studies  is  the  effect  of  the 
operating  system  and  multiprogramming  on  cache  performance;  most  traces  consist  only  of  a  single 
program’s  execution.  It  has  been  acknowledged  in  the  past  that  this  overhead  introduces  interference 
which  limits  the  benefits  of  new  designs,  but  evaluations  using  multiprogrammed  traces  have  been 
neglected  due  to  the  lack  of  readily  available  tools  that  can  capture  such  traces. 

In  this  research  we  describe  a  new  tracing  system  that  allows  the  capture  of  both  operating 
system  and  multiprogrammed  execution  data.  Cache  performance  is  studied  using  multiprogrammed 
traces  of  the  SPEC  benchmarks.  We  study  the  effects  of  considering  multiple  tasks  on  the  cache  miss 
rate.  The  performance  variation  is  primarily  due  to  the  presence  of  context  switches.  In  an  attempt 
to  extend  this  work,  we  develop  an  analytical  model  that  is  used  to  synthetically  incorporate  context 
switches  into  a  single  process’  trace. 

We  have  found  that  the  operating  system  introduces  a  small  but  persistent  overhead  to 
cache  performance.  Additional  processes  have  an  even  greater  impact,  which  increases  as  the  level 
of  multi-tasking  increases.  Spatial  locality  is  not  significantly  affected  by  these  conditions,  but  the 
temporal  locality  of  a  program  is  substantially  reduced  by  the  presence  of  context  switches. 


Contents 

1  Introduction  1 

2  Background  3 

2.1  Cache  Performance .  3 

2.2  Cache  Analysis .  7 

2.2.1  Methods .  7 

2.2.2  Issues .  9 

2.3  Current  Work .  13 

3  ATOM  Overview  15 

3.1  General  Use .  15 

3.2  Operating  System  Implementation .  17 

3.2.1  Set  Up .  17 

3.2.2  Programming .  18 

3.2.3  Execution .  20 

3.3  Problem  Areas .  20 

3.3.1  ATOM  Limitations . 20 

3.3.2  Kernel  Limitations .  21 

3.3.3  Program  Size .  22 

3.3.4  Execution  Speed .  23 

3.3.5  Re-entrance .  25 

3.3.6  Reference  Stream  Accuracy .  26 

3.3.7  Portability  .  27 

4  Test  Methodology  28 

4.1  Cache  Model .  28 

4.2  Verification .  33 

4.3  Simulations .  36 

4.3.1  Platform  Information  .  36 

4.3.2  Test  Parameters .  37 

5  Simulation  Results  41 

5.1  Cache  Workload  .  41 

5.2  Impact  on  Process  Performance .  45 

5.3  Process  Interference  .  47 

5.4  Impact  on  Cache  Performance .  58 

5.5  Summary .  59 

5.6  Future  Work .  69 

6  Context  Switch  Model  70 

6.1  Theory .  70 

6.2  Development .  72 

6.3  Implementation .  73 

6.3.1  Frequency .  73 

6.3.2  Impact .  76 

6.4  Testing  .  80 


7  Model  Evaluation  81 

7.1  Individual  Results  for  n=l .  81 

7.2  Individual  Results  for  n=2 .  81 

7.3  Interference  Comparison .  81 

7.4  Summary .  89 

7.5  Future  Work  .  92 

8  Conclusions  93 

9  Contributions  of  this  Thesis  94 

10  Acknowledgments  96 

11  Bibliography  97 

A  Program  Source  Code  101 

A.l  Input  Format . 102 

A.2  Output  Format . 103 

A.3  Cache  Model  Library . 107 

A.4  Kernel  Instrumentation  File . 109 

A. 5  Kernel  Analysis  File . Ill 

A. 6  Program  Instrumentation  File . 117 

A. 7  Program  Analysis  File . 119 

A.8  Sample  Tool  Description  File . 130 

A.9  Model  Library  . 131 

A. IO  Model  Analysis  File . 133 

B  Tables  of  Simulation  Results  144 

B. l  Compress  Alone  . 144 

B.2  GCC  Alone . 144 

B.3  Espresso  Alone . 144 

B.4  Alvinn  Alone . 144 

B.5  Compress  w/  Operating  System . 144 

B.6  GCC  w/  Operating  System . 144 

B.7  Espresso  w/  Operating  System . 145 

B.8  Alvinn  w/  Operating  System . 145 

B.9  Compress  and  GCC  w/  Operating  System . 145 

B.IO  Compress  and  Espresso  w/  Operating  System . 145 

B.ll  GCC  and  Espresso  w/  Operating  System  .  145 

B.12  Compress  w/  Model,  n=l .  145 

B.13  GCC  w/  Model,  n=l . 145 

B.14  Espresso  w/  Model,  n=l . 145 

B.15  Alvinn  w/  Model,  n=l . 145 

B.16  Compress  w/  Model,  n=2 .  145 

B.17  GCC  w/  Model,  n=2 .  145 

B.18  Espresso  w/  Model,  n=2 . 146 


11 


List  of  Figures 

1  Program  Block  Diagram .  19 

2  Operating  System  Instruction  Fetches  Over  Repeated  Program  Execution .  35 

3  Operating  System  Instruction  Fetches  Within  Same  Program  Execution .  36 

4  Percent  of  Total  References  From  Operating  System .  43 

5  Percent  Increase  in  Number  of  References  by  Including  Operating  System .  43 

6  Distribution  of  Reference  Types .  44 

7  Process  Instruction  Reference  Miss  Rates  For  Compress .  48 

8  Process  Data  Reference  Miss  Rates  For  Compress .  49 

9  Process  Instruction  Reference  Miss  Rates  ForGCC .  50 

10  Process  Data  Reference  Miss  Rates  For  GCC .  51 

11  Process  Instruction  Reference  Miss  Rates  For  Espresso .  52 

12  Process  Data  Reference  Miss  Rates  For  Espresso .  53 

13  Process  Instruction  Reference  Miss  Rates  For  Alvinn .  54 

14  Process  Data  Reference  Miss  Rates  For  Alvinn .  55 

15  Percent  Misses  From  Instructions,  Compress .  56 

16  Percent  Misses  From  Instructions,  GCC .  56 

17  Percent  Misses  From  Instructions,  Espresso .  56 

18  Percent  Misses  From  Instructions,  Alvinn .  56 

19  Percent  Self  Overwritten  for  Compress .  57 

20  Percent  Self  Overwritten  for  GCC  .  57 

21  Percent  Self  Overwritten  for  Espresso  .  57 

22  Percent  Self  Overwritten  for  Alvinn .  57 

23  Instruction  Cache  Miss  Rates  With  Compress .  60 

24  Data  Cache  Miss  Rates  With  Compress .  61 

25  Instruction  Cache  Miss  Rates  With  GCC  .  62 

26  Data  Cache  Miss  Rates  With  GCC .  63 

27  Instruction  Cache  Miss  Rates  With  Espresso  .  64 

28  Data  Cache  Miss  Rates  With  Espresso .  65 

29  Instruction  Cache  Miss  Rates  With  Alvinn  .  66 

30  Data  Cache  Miss  Rates  With  Alvinn .  67 

31  Percent  Instruction  Misses  From  Kernel .  68 

32  Percent  Data  Misses  From  Kernel .  68 

33  Time  Space  Diagram  of  Process  Execution .  71 

34  Execution  Interval  Given  Some  Probability  [0..1] .  75 

35  Actual  Distribution  of  Random  Execution  Intervals .  76 

36  Probability  of  Cache  Blocks  Being  Overwritten;  F=:100 .  78 

37  Probability  of  Cache  Blocks  Being  Overwritten;  F=1000  .  78 

38  Model  Results  for  Compress;  n=l .  82 

39  Model  Results  for  GCC;  n=l .  83 

40  Model  Results  for  Espresso;  n=l .  84 

41  Model  Results  for  Alvinn;  n=l .  85 

42  Model  Results  for  Compress;  n=2 .  86 

43  Model  Results  for  GCC;  n=2 .  87 

44  Model  Results  for  Espresso;  n=2 .  88 

45  Percent  Self  Overwritten  for  Compress;  n=l .  90 

46  Percent  Self  Overwritten  for  GCC;  n=:l .  90 

47  Percent  Self  Overwritten  for  Espresso;  n=l  .  90 

48  Percent  Self  Overwritten  for  Alvinn;  n=l  .  90 

49  Percent  Self  Overwritten  for  Compress;  n=2 .  91 


iii 


50  Percent  Self  Overwritten  for  GCC;  n=2 .  91 

51  Percent  Self  Overwritten  for  Espresso;  n=2 .  91 

List  of  Tables 

1  Simulated  Cache  Parameters  .  40 

2  Benchmark  References .  41 

3  Benchmark  with  Operating  System  References  .  42 

4  Concurrent  Benchmarks  with  Operating  System  References .  45 

5  System  Overhead  Comparison .  45 

6  Compress  Alone  .  147 

7  GCC  Alone . 148 

8  Espresso  Alone .  149 

9  Alvinn  Alone .  150 

10  Compress  w/  Operating  System,  Compress  Data . 151 

11  Compress  w/  Operating  System,  Operating  System  Data  . 152 

12  Compress  w/  Operating  System,  Combined  Data . 153 

13  GCC  w/  Operating  System,  GCC  Data . 154 

14  GCC  w/  Operating  System,  Operating  System  Data . 155 

15  GCC  w/  Operating  System,  Combined  Data .  156 

16  Espresso  w/  Operating  System,  Espresso  Data .  157 

17  Espresso  w/  Operating  System,  Operating  System  Data .  158 

18  Espresso  w/  Operating  System,  Combined  Data .  159 

19  Alvinn  w/  Operating  System,  Alvinn  Data  .  160 

20  Alvinn  w/  Operating  System,  Operating  System  Data . 161 

21  Alvinn  w/  Operating  System,  Combined  Data .  162 

22  Compress  and  GCC  w/  Operating  System,  Compress  Data .  163 

23  Compress  and  GCC  w/  Operating  System,  GCC  Data .  164 

24  Compress  and  GCC  w/  Operating  System,  Operating  System  Data . 165 

25  Compress  and  GCC  w/  Operating  System,  Combined  Data .  166 

26  Compress  and  Espresso  w/  Operating  System,  Compress  Data .  167 

27  Compress  and  Espresso  w/  Operating  System,  Espresso  Data .  168 

28  Compress  and  Espresso  w/  Operating  System,  Operating  System  Data . 169 

29  Compress  and  Espresso  w/  Operating  System,  Combined  Data .  170 

30  GCC  and  Espresso  w/  Operating  System,  GCC  Data . 171 

31  GCC  and  Espresso  w/  Operating  System,  Espresso  Data . 172 

32  GCC  and  Espresso  w/  Operating  System,  Operating  System  Data . 173 

33  GCC  and  Espresso  w/  Operating  System,  Combined  Data .  174 

34  Compress  w/  Model,  n=l . 175 

35  GCC  w/  Model,  n=l .  176 

36  Espresso  w/  Model,  n=l .  177 

37  Alvinn  w/  Model,  n=l .  178 

38  Compress  w/  Model,  n=2 .  179 

39  GCC  w/  Model,  n=2 .  180 

40  Espresso  w/  Model,  n=2 .  181 


IV 


1  Introduction 


The  technological  improvements  in  processor  technology  are  far  outstripping  the  advances 
made  in  memory  circuit  design.  As  processors  execute  faster  and  faster,  the  latency  experienced 
when  accessing  memory  becomes  a  major  limitation.  Faster  memory  is  available,  but  at  greater 
cost.  An  economical  balance  between  performance  and  price  is  achieved  through  the  use  of  memory 
caches.  The  main  memory  is  implemented  using  less  expensive  but  slow  technologies  such  as  SRAM, 
making  a  large  memory  feasible.  A  much  smaller  memory  cache  is  constructed  of  faster  (and  more 
expensive)  memory  circuits,  such  as  DRAM,  to  be  used  as  a  buffer  between  the  main  memory  and 
the  processor.  Sections  of  the  data  stored  in  main  memory  are  copied  into  the  cache,  allowing  it  to 
be  accessed  much  more  quickly.  Which  sections  of  memory  are  copied  into  the  cache,  and  how  the 
information  is  maintained,  is  a  function  of  the  design  of  the  cache  [22,  36,  52], 

A  cache  is  effective  in  reducing  the  average  memory  access  time  because  of  certain  properties 
found  in  software.  The  collection  of  instruction  and  data  addresses  used  by  a  program  over  some 
time  interval  is  referred  to  as  its  working  set  [3]  or  footprint  [56].  The  working  set  may  change  as 
the  program  executes,  but  it  generally  exhibits  two  properties: 

1.  spatial  locality,  and 

2.  temporal  locality. 

Spatial  locality  refers  to  the  property  that  addresses  tend  to  cluster  together  in  space.  References  may 
be  sequential  or  in  some  other  way  structured,  denoting  a  high  degree  of  spatial  locality.  Similarly, 
temporal  locality  refers  to  the  property  that  addresses  tend  to  cluster  together  in  time.  Addresses 
in  the  working  set  may  be  used  repeatedly  during  their  lifetime,  denoting  a  high  degree  of  temporal 
locality. 

These  two  properties  allow  caches  to  improve  memory  system  performance.  A  memory 
reference  which  is  not  in  the  cache  causes  a  cache  miss.  The  data  at  the  referenced  location  and 
some  number  of  its  adjoining  locations  is  brought  into  the  cache.  Due  to  locality,  it  is  likely  that 
either  the  same  location  (temporal),  or  nearby  locations  (spatial),  will  be  referenced  in  the  near 
future.  When  these  references  occur,  they  are  already  present  in  the  cache  and  a  cache  hit  ensues. 
On  a  hit,  the  data  can  be  very  rapidly  supplied  to  the  processor,  much  faster  than  an  access  to  the 
main  memory.  The  improvement  provided  by  a  cache  becomes  a  function  of  how  often  a  hit  occurs 


1 


and  how  fast  the  addressed  data  can  be  provided  to  the  processor,  balanced  by  the  delay  introduced 
when  servicing  a  cache  miss. 


The  critical  nature  of  caches  has  led  to  extensive  study  of  various  designs,  configurations, 
and  enhancements,  all  oriented  towards  increasing  cache  performance.  There  are  diverse  methods 
available  to  assess  the  alternatives,  ranging  from  prototyping  to  simulation.  Regardless  of  the 
method,  the  accuracy  of  the  evaluation  is  paramount.  The  criteria  used  to  justify  any  evaluation  must 
accurately  reflect  the  environment  to  which  the  cache  will  be  subjected,  otherwise  any  conclusions 
are  questionable. 

One  of  the  major  shortcomings  of  the  most  common  evaluation  methods  is  that  the  effect 
of  the  operating  system  and  multiple  user  processes  being  executed  are  neglected.  The  methods  are 
simpler,  but  ignore  a  major  aspect  of  the  computer’s  architecture.  Several  past  efforts  have  shown 
the  related  impact  is  significant  enough  to  warrant  inspection  [1,  2,  8,  11,  12,  41],  and  is  certainly 
a  more  realistic  representation  of  the  execution  environment.  The  drawback  is  the  difficulty  of 
incorporating  these  considerations  into  the  evaluation.  There  is  generally  some  overhead  required, 
in  time  and/or  resources,  to  perform  such  complex  tests. 

The  research  described  here  focused  on  developing  a  tool  to  capture  multiprocess  state 
information  and  perform  subsequent  evaluations,  exploring  its  capabilities  with  studies  in  both 
detailed  cache  simulations  and  testing  an  analytical  model.  This  thesis  is  organized  as  follows.  In 
section  2  cache  performance  and  evaluation  methods  are  reviewed.  Section  3  describes  the  analysis 
tool  ATOM,  and  how  it  can  be  used  specifically  on  the  operating  system  and  in  a  multi-process 
environment.  Section  4  discusses  the  methodology  followed  in  this  research  and  outlines  the  tests 
performed.  Section  5  reviews  the  results  of  simulations  performed  in  the  multi-process  environment. 
In  section  6  an  analytical  model  is  presented  that  can  be  used  to  simplify  simulations  with  minimal 
loss  of  accuracy,  which  is  tested  in  section  7.  Section  8  concludes  the  work,  with  a  summary  of  its 
contributions  in  section  9.  Last  are  section  10,  the  acknowledgments  and  section  11,  the  bibliography. 
Two  appendices  are  attached.  A,  copies  of  the  programs  used  in  this  research,  and  B,  tables  of  all 
simulation  results. 


2 


2  Background 

2.1  Cache  Performance 

Cache  performance  encompasses  a  variety  of  issues.  At  the  most  basic  level,  the  performance 
of  a  cache  can  be  defined  by  its  miss  rate  (or  ratio),  the  percentage  of  references  applied  to  the  cache 
whose  data  was  not  already  present  in  the  cache.  Alternatively  the  hit  rate,  which  is  the  percentage 
already  present,  may  be  referred  to.  The  two  values  represent  equivalent  information,  since  the 
miss  rate  equals  one  minus  the  hit  rate  and  vice  versa.  Depending  on  the  system  and  evaluation 
performed,  however,  this  metric  may  be  an  oversimplification.  The  goal  of  the  cache  is  to  improve 
the  average  memory  access  time,  which  is  a  function  of  more  than  just  the  miss  rate.  It  is  entirely 
possible  for  a  cache  to  have  a  low  miss  rate,  but  due  to  other  consideration  have  a  long  access  time 
thus  limiting  its  usefulness.  Hence  many  evaluations  are  based  not  on  miss  rates,  but  rather  refer  to 
the  cache  latency  [7,  8,  41,  47].  The  drawback  is  that  to  perform  an  evaluation  of  that  magnitude 
is  much  more  difficult  and  requires  modeling  a  greater  portion  of  the  system  under  test,  so  focusing 
simply  on  miss  rates  is  frequently  used  anyway. 

Regardless  of  the  standard  used,  the  cache  miss  rate  is  important,  as  the  average  access 
time  does  depend  on  this  value.  To  understand  the  significance  of  the  miss  rate,  it  is  important  to 
understand  the  various  sources  of  misses.  A  program  generates  a  stream  of  memory  references  as  it 
executes,  which  are  applied  to  the  cache.  Cache  misses  are  caused  when  an  address  in  the  reference 
stream  is  not  present  in  the  cache.  This  can  occur  for  basically  three  reasons  [3,  55]: 

Start  Up  The  first  form  of  miss  is  caused  the  first  time  that  a  particular  address  is  referenced  in 
the  stream.  Since  it  has  not  been  referenced  before,  there  is  no  expectation  that  that  memory 
location  would  have  been  copied  into  the  cache.  Such  misses  are  encountered  primarily  when 
a  program  begins  executing  and  all  references  are  new,  also  called  the  warm  up  phase  of  the 
cache.  The  size  of  the  cache  and  the  program  both  contribute  to  the  length  of  this  phase. 
As  the  working  set  changes,  additional  start  up  misses  are  encountered  as  new  locations  are 
referenced. 

Though  a  certain  address  may  not  have  been  previously  referenced,  it  is  still  possible  that  its 
data  is  already  in  the  cache.  When  data  is  copied  from  memory  to  the  cache,  it  is  moved  in 
quantities  called  blocks.  A  block  is  usually  larger  than  a  single  memory  access,  so  a  single  miss 
fetches  more  data  than  is  required  for  a  single  access.  If  a  location  is  referenced  that  resides  in 


3 


a  block  already  fetched,  it  will  hit,  even  though  that  particular  address  may  be  new.  This  is 
only  effective  for  memory  references  that  are  primarily  sequential,  such  as  instruction  fetches, 
in  which  case  a  large  block  size  is  beneficial.  Footprints  with  less  locality,  such  as  data  loads 
and  stores,  can  actually  have  the  reverse  effect  as  large  blocks  bring  in  excess  data  which  is 
never  used. 

Another  technique  to  prevent  start  up  misses  is  the  use  of  prefetching  [14,  15,  52].  This  is 
essentially  an  attempt  to  predict  what  locations  will  be  referenced  in  the  near  future,  and  fetch 
them  into  the  cache  before  they  are  requested.  The  method  of  prediction  can  be  hardware  or 
software  based,  and  must  be  accurate  for  prefetching  to  be  effective.  If  data  is  falsely  predicted 
and  fetched  into  the  cache,  it  may  overwrite  “live”  data  (live  meaning  that  it  is  still  part  of  the 
current  working  set),  causing  cache  pollution.  Additional  enhancements  such  as  a  pre  fetch 
buffer  filter  or  victim  cache  can  be  used  to  limit  this  impact  [22].  Using  prefetching  can  improve 
miss  rates,  however  it  also  increases  the  traffic  between  the  cache  and  memory.  An  accurate 
evaluation  cannot  consider  only  miss  rates  with  this  technique,  otherwise  its  drawbacks  will 
be  obscured. 

Capacity  The  second  form  of  miss  is  due  to  the  finite  cache  size.  A  large  program  cannot  possibly 
fit  its  entire  working  set  into  a  small  cache.  As  various  parts  of  the  working  set  are  used,  they 
will  overwrite  other  live  data.  The  obvious  solution  is  to  use  a  larger  cache,  but  at  additional 
expense.  Another  potential  solution  is  to  analyze  the  locations  used  in  the  working  set.  The 
references  may  cluster  around  certain  blocks  while  others  are  unused.  Changing  the  mapping 
of  addresses  to  cache  lines  (or  indices)  may  allow  the  references  to  be  better  distributed  across 
all  cache  lines  [7].  This  technique  is  also  an  effective  counter  for  the  next  type  of  miss,  which 
together  with  capacity  misses  are  sometimes  referred  to  as  intrinsic  interference. 

Conflict  The  third  form  of  miss  is  due  to  conflict  between  two  references.  If  two  addresses  in  the 
working  set  map  to  the  same  cache  line,  each  time  they  are  referenced  a  cache  miss  may  result 
(depending  on  the  actual  pattern  of  references).  Again,  altering  the  mapping  algorithm  may 
reduce  the  amount  of  conflict  in  a  given  reference  stream  by  spreading  out  clumps.  Another 
option  is  to  use  an  associative  cache  [22,  52].  In  this  form  of  cache,  each  cache  line  (sometimes 
called  set)  can  maintain  multiple  blocks,  so  multiple  locations  can  map  to  the  same  line  without 
conflict.  The  number  of  blocks  held  in  each  line  is  referred  to  as  the  set  size  or  associativity 


4 


of  that  cache,  and  can  vary  from  1  to  the  maximum  possible  given  the  available  chip  area. 
This  type  of  cache  can  be  pictured  as  a  two  dimensional  array  of  blocks,  with  the  vertical 
dimension  the  number  of  lines  and  the  horizontal  the  associativity.  The  bounding  cases  are 
a  direct  mapped  cache  with  an  associativity  of  one,  and  a  fully  associative  cache  with  only 
one  line.  The  drawback  is  that  for  a  finite  cache  area,  increasing  the  associativity  decreases 
the  number  of  cache  lines,  so  each  line  in  the  cache  has  more  locations  mapped  to  it  and  a 
corresponding  heavier  load.  Also,  associative  caches  are  frequently  slower,  which  should  be  a 
factor  in  comprehensive  evaluations. 

These  three  categories  comprise  the  basic  types  of  misses  found  in  a  process’  reference  stream.  They 
must  be  considered  in  even  a  minimal  performance  measurement,  although  there  are  other  cache 
components  that  may  improve  memory  system  performance  without  affecting  the  miss  rate. 

Other  cache  enhancements  which  do  not  directly  affect  miss  rates  are  usually  related  to 
access  times.  Techniques  such  as  using  a  Translation  Lookaside  Buffer  (TLB)  [49]  can  perform 
cache  lookups  and  virtual  address  conversions  in  parallel.  Other  methods  include  using  hierarchies 
of  caches,  such  as  a  small  direct  mapped  cache  on  chip  and  a  second  level  larger  cache,  possibly 
associative,  off  chip.  Using  combinations  of  caches  can  potentially  improve  the  performance  more 
than  a  single  highly  complex  cache  [52].  In  some  instances  an  entire  cache  is  not  added,  but  various 
buffers  or  filters  are  accommodated,  such  as  the  prefetch  buffer  or  victim  cache  [7]. 

The  cache  performance  will  depend  on  many  characteristics  of  the  cache.  Some  of  the  most 
basic  are  its  size  and  structure,  and  the  method  it  uses  to  resolve  both  hits  and  misses  for  each  ref¬ 
erence  type  (instruction  fetch,  data  read,  and  data  write).  Performance  enhancing  mechanisms  may 
also  be  included,  each  addressing  various  deficiencies.  Studies  have  shown  that  multiple  mechanisms 
in  concert  are  generally  the  most  effective  [47].  The  wide  variety  of  cache  designs  makes  the  ability 
to  evaluate  various  options  paramount,  and  there  are  concerns  that  have  yet  to  be  addressed  which 
further  complicate  analysis. 

So  far  in  this  discussion,  caches  have  been  considered  in  an  idealized  environment.  Modern 
computers  do  not  simply  execute  a  single  program  continuously  until  its  completion.  The  operat¬ 
ing  system  generates  its  own  references  as  system  calls  are  requested.  The  operating  system  also 
generates  references  for  processes  such  as  interrupt  services  and  other  management  tasks,  which  are 
performed  periodically.  Even  more  complex  is  a  multiprocess  environment,  with  multiple  programs 
or  threads  being  executed.  In  a  multitasking  system  there  are  several  processes  or  tasks  all  vying 


5 


for  system  resources,  one  of  which  is  memory.  In  a  uniprocessor  system,  control  is  accomplished  by 
time  sharing.  The  various  tasks  are  executed  for  finite  intervals  and  then  execution  is  switched  to 
another  process  —  called  a  context  switch.  As  each  task  is  scheduled  and  executed,  it  generates 
its  own  reference  stream  with  unique  characteristics.  The  individual  streams  are  interleaved  by  the 
context  switches  to  yield  an  aggregate  reference  stream  which  impinges  on  the  cache  [19,  31,  56]. 

This  introduces  a  new  mechanism  causing  a  fourth  and  final  type  of  miss,  transient  cache 
misses.  When  a  process  is  swapped  out  during  a  context  switch,  the  process  or  processes  that  execute 
until  the  original  process  is  returned  will  overwrite  its  cache  data.  This  data  may  still  have  been 
live,  so  the  overwrites  may  cause  additional  cache  misses  once  the  original  process  is  restored.  This 
is  referred  to  as  extrinsic  interference  [2],  as  opposed  to  the  intrinsic  interference  discussed  above, 
and  can  be  thought  of  as  a  reload  period  after  each  context  switch  as  evicted  data  is  returned  to 
the  cache  [56].  The  impact  of  extrinsic  interference  will  magnify  with  increased  multiprogramming 
as  the  duration  of  each  swap  is  extended,  although  this  can  be  partially  negated  by  stabilizing  the 
time  quantum  that  each  process  executes. 

Some  designs  call  for  the  cache  to  be  totally  flushed  (invalidated)  at  each  context  switch 
automatically.  This  might  be  appropriate  for  a  control  mechanism  such  as  the  cache  type  structure 
used  to  implement  a  TLB,  but  in  an  instruction  or  data  cache  it  is  quite  likely  that  some  of  the  live 
data  from  a  process  would  still  be  resident  when  that  process  returns  to  execution.  By  maintaining 
the  cache  data  for  as  long  as  possible,  the  extrinsic  interference  is  kept  to  a  minimum;  although  this 
does  require  additional  overhead  to  monitor  the  owner  of  each  line  of  cache  data,  and  complicates 
analysis  [22]. 

Other  architecture  issues  can  further  complicate  performance  consideration.  A  multipro¬ 
cessor  system  is  similar  to  what  has  already  been  discussed,  but  more  complicated.  Not  only  are 
multiple  reference  streams  being  generated,  they  are  generated  simultaneously  and  possibly  applied 
to  multiple  caches.  Each  processor  may  maintain  its  own  memory  structure  or  they  may  share 
a  common  structure.  This  raises  the  issue  of  cache  coherency,  or  the  property  that  data  stored 
in  memory  is  properly  maintained  in  each  location  it  is  represented.  If  multiple  processes  share 
memory  but  have  their  own  caches,  care  must  be  taken  to  monitor  when  data  is  in  multiple  caches 
(shared)  so  that  if  the  data  is  modified,  it  is  modified  in  all  caches.  Various  policies  can  be  used 
when  data  is  stored  to  the  cache,  such  as  write  through,  meaning  data  is  written  to  memory  as 
soon  as  it  is  written  to  cache,  or  write  back,  meaning  the  data  is  not  written  to  memory  until  it 


6 


is  evicted  from  the  cache.  Each  has  various  advantages  and  disadvantages,  and  in  turn  affects  the 
policy  used  to  maintain  coherence  [15,  29].  There  are  a  variety  of  other  technical  issues  as  well, 
such  as  communication  and  synchronization,  making  this  a  very  complex  design.  Even  more  radical 
departures  from  the  traditional  von  Neumann  architecture,  to  a  dataflow  architecture  for  example, 
cause  even  greater  difficulties  in  defining  evaluation  criteria  [30]. 

2.2  Cache  Analysis 

2.2.1  Methods 

There  are  a  variety  of  methods  available  to  evaluate  cache  performance.  General  reviews 
are  presented  in  [1,  11,  13,  60].  The  techniques  can  be  broken  down  into  various  categories: 

Analytical  Models  The  most  abstract  form  of  analysis  is  based  on  a  theoretical  prediction  derived 
from  the  test  system’s  characteristics  and  assumptions  of  how  it  is  loaded.  Developing  a  model  of 
the  system  under  test  requires  certain  assumptions  which  may  oversimplify  aspects  of  cache  design, 
neglect  relevant  characteristics  of  the  input,  or  may  not  be  sufficiently  verified  to  warrant  their  use. 
The  accuracy  of  the  evaluation  is  limited  by  the  accuracy  of  the  theoretical  model,  and  unfortunately, 
the  more  accurate  and  comprehensive  the  model,  the  more  difficult  it  is  to  solve  [3].  Some  models 
are  based  on  abstract  parameters  with  little  relation  to  the  actual  system  [31],  and  others  may 
require  considerable  test  program  characterization;  to  the  point  that  other  methods  would  be  equally 
suitable  [56].  The  most  successful  models  tend  to  focus  on  very  limited  aspects  of  memory  system 
performance  to  reduce  their  scope  [28,  55]. 

Hardware  Evaluation  The  antithesis  of  theoretical  analysis  is  hardware  evaluation.  In  this 
method,  the  test  system  is  implemented  and  inserted  into  some  platform.  Its  performance  can 
then  be  monitored  directly  as  the  platform  is  operated.  The  actual  analysis  is  quite  quick,  as 
the  processing  is  conducted  at  the  same  speed  as  the  platform,  however  the  test  system  must  be 
constructed,  which  may  be  a  slow  and  expensive  process.  The  other  disadvantage  is  that  to  test 
a  variety  of  alternative  designs,  each  alternative  must  be  constructed.  This  limits  the  flexibility 
and  can  be  even  more  costly.  Rapid  prototyping  can  make  this  method  more  attractive,  and  some 
examples  have  been  found  in  [11,  24].  Using  techniques  of  hardware  emulation  can  also  be  more 
efficient,  although  they  are  slower  [40]. 


7 


Trace  Based  Simulation  By  far  the  most  common  form  of  analysis  is  trace  driven  simulation. 
A  trace  of  program  references  is  generated  and  applied  to  a  model  of  the  system  being  tested.  The 
model  is  simulated  in  software,  and  can  be  as  complex  as  accuracy  dictates.  A  software  model  is 
very  flexible,  but  simulations  are  slower  to  compute.  Also,  the  traces  must  somehow  be  stored, 
which  requires  a  great  deal  of  memory,  although  they  can  be  reused.  The  trace  can  be  as  complex 
as  desired,  and  there  are  a  variety  of  methods  that  can  be  used  to  generate  it: 

Synthetic  Generation  Workloads  can  be  created  for  system  test  through  the  use  of  synthetic 
generators.  No  programs  need  be  executed,  reference  streams  are  simply  generated  randomly. 
Some  control  is  provided  through  defining  random  variables  and  their  distributions,  establish¬ 
ing  the  desired  characteristics  of  the  workload.  Since  it  is  artificially  generated,  however,  its 
accuracy  is  highly  suspect.  Various  examples  of  this  technique  can  be  found  in  [35,  46,  57,  58]. 

System  Emulation  Another  alternative  which  does  not  require  program  execution  uses  system 
emulation.  A  test  program  is  required,  but  it  is  fed  into  an  instruction  set  simulator  which 
generates  reference  stream  data.  This  pseudo  execution  of  programs  is  very  slow,  though,  and 
is  rarely  used  [60]. 

Hardware  Capture  The  last  two  methods  monitor  the  execution  of  a  test  program  on  some  plat¬ 
form,  capturing  the  reference  stream  as  the  program  executes.  In  hardware  capture,  the 
platform  is  modified  so  that  as  it  executes  the  test  code,  the  references  generated  are  collected 
and  stored.  It  is  easy  to  capture  a  wide  variety  of  references  in  the  trace  working  at  this  level, 
but  this  technique  suffers  from  the  disadvantage  of  requiring  unique  hardware  and/or  costly 
modification.  The  two  most  common  forms  of  hardware  capture  have  been  accomplished  by 
modifying  the  microcode  of  the  CPU  [1,  2],  or  by  using  test  probes  inserted  into  the  system 
to  electrically  read  the  system  status  [11,  60].  The  first  can  only  be  used  with  certain  archi¬ 
tectures,  however,  and  the  latter  is  limited  by  the  external  visibility  of  data  (for  instance,  an 
on  chip  cache  could  not  be  monitored).  Once  each  reference  is  captured,  there  are  a  variety 
of  ways  to  record  it,  such  as  storing  it  in  a  buffer  and  occasionally  writing  the  buffer  to  a  file. 
The  method  must  be  able  to  record  data  as  fast  as  the  system  generates  it,  which  may  be 
a  significant  limitation.  Despite  the  disadvantages,  this  method  is  frequently  used  in  certain 
situations  where  other  methods  may  not  be  feasible,  such  as  very  complex  architectures  [5,  59]. 


8 


Software  Capture  The  most  common  form  of  trace  generation  is  by  software  capture.  Instead  of 
modifying  the  testbed,  the  software  can  be  altered  so  that  information  about  the  program’s 
execution  is  recorded.  Again,  the  trace  is  generally  stored  in  a  buffer  until  it  can  be  written  out 
to  a  file,  although  there  are  alternatives.  Software  capture  is  more  flexible  than  hardware  based 
methods,  as  the  information  that  is  collected  can  be  easily  updated  as  evaluation  needs  change, 
but  capturing  all  aspects  of  the  reference  stream  (such  cis  the  operating  system)  can  be  difficult. 
Capture  can  be  based  on  snooping  programs  [50],  interrupt  generation  [32],  or  by  explicitly 
modifying  the  test  code.  This  modification  can  occur  during  compilation  [7,  8,  25,  43,  45]  or 
can  be  applied  to  an  existing  executable  [11,  12,  13,  54]. 

Extensions  There  are  also  various  extensions  that  can  be  used  with  the  above  techniques  to 
improve  their  efficiency.  For  instance,  one  major  drawback  of  trace  based  simulation  is  the  storage 
space  required  for  the  traces.  To  compensate,  it  is  possible  to  have  the  analysis  program  executing 
concurrently  with  the  trace  generation,  so  that  no  long  term  storage  is  required;  one  example  is 
[8].  This  does  preclude  reuse,  however.  Other  techniques  include  sampling  traces  to  reduce  their 
length,  although  this  may  affect  their  accuracy  depending  on  what  assumptions  are  made  in  the 
sampling  process  [1,  2,  6,  33,  61].  It  is  also  possible  to  simply  compress  the  trace  file,  but  this 
is  only  a  short  term  solution.  Other  extensions  include  using  various  processing  algorithms  such 
as  stack  based  processing  to  simplify  simulation  [48,  64],  or  reducing  processing  time  with  parallel 
computation  [42,  43,  63].  Analytical  models  can  be  used  in  conjunction  with  program  traces  to 
simplify  simulation  and  provide  evaluation  over  a  variety  of  system  characteristics  with  a  single 
execution  [3]. 

2.2.2  Issues 

The  evaluation  method  used  must  accurately  reflect  the  type  of  workload  that  would  be 
present  in  a  real  system.  This  is  particularly  a  concern  when  analytical  models  are  used,  as  programs 
may  not  be  executed  at  all,  so  a  statistical  approach  is  common  [57,  58].  For  hardware  measurement 
and  trace  based  simulation,  this  problem  is  addressed  by  selecting  appropriate  programs  to  be 
executed  in  the  evaluation.  Specific  programs  known  as  benchmarks  are  used  as  accepted  standards 
for  testing  [34,  45,  49].  There  are  differences  in  workloads  depending  on  the  type  of  programs  being 
considered,  whether  they  are  technical  or  commercial  applications  [37],  so  generally  multiple  test 


9 


programs  are  used  to  ensure  the  evaluation  is  comprehensive.  The  better  test  programs  will  have 
a  large  and  complex  footprint  to  exercise  the  cache  fully,  although  this  can  make  standardization 
more  difScult  and  analysis  slower. 

Once  a  workload  is  identified,  how  it  is  represented  and  used  in  the  analysis  can  vary.  If 
a  program  is  executed  or  traced,  there  are  a  variety  of  concerns  that  must  be  addressed  for  the 
evaluation  to  have  much  confidence  [1,  11,  13,  60]: 

Reference  Scope  The  simplest  forms  of  references  to  monitor  are  from  a  single  process  [7,  25, 
45,  61,  62],  but  though  they  are  easy  to  capture  they  are  also  not  particularly  a  realistic 
reflection  of  cache  loading.  Even  in  this  basic  form,  care  must  be  taken  to  ensure  that  shared 
libraries  and  other  common  structures  are  captured.  A  more  realistic  reference  stream  includes 
additional  processes,  and  if  possible,  the  operating  system.  Hardware  evaluation  of  a  cache  and 
hardware  based  trace  capture  for  simulation  do  allow  capture  of  all  references,  but  as  mentioned 
before  they  have  other  drawbacks.  It  may  be  difficult  to  identify  the  source  of  particular 
references,  too,  making  analysis  more  difficult.  Through  the  use  of  comprehensive  software 
capture  mechanisms,  it  is  possible  to  capture  traces  with  multiple  processes  [8,  41].  In  its  most 
complex  form,  this  mechanism  can  also  be  used  to  capture  traces  that  include  the  operating 
system  [1,  2],  however  a  thorough  understanding  of  the  test  system  is  necessary  for  proper 
implementation.  Such  references  are  more  difficult  to  capture,  and  present  a  new  problem 
in  processing.  The  multiprocess  environment  is  non-deterministic,  the  reference  stream  can 
vary  even  for  execution  of  the  same  test  programs  as  scheduling  and  interrupts  change  the 
execution  pattern.  For  a  truly  accurate  comparison,  all  tests  must  be  performed  from  a  single 
stored  trace,  or  they  must  all  be  performed  concurrently  from  the  stream  as  it  is  generated 
and  processed  [8]. 

Reference  Length  Another  accuracy  problem  with  reference  streams  are  their  length.  As  caches 
increase  in  size,  more  references  are  required  to  fully  exercise  them.  A  large  cache  can  contain 
a  large  footprint,  so  a  long  program  is  needed  to  generate  such  a  footprint.  This  is  particularly 
relevant  for  RISC  machines,  which  will  have  significantly  longer  traces  for  a  given  program 
because  of  the  increased  number  of  instructions.  Current  practices  call  for  on  the  order  of  100 
million  to  10  billion  references  to  be  an  adequate  [8].  Hardware  evaluation  places  no  constraint 
on  program  execution,  but  traced  based  methods  may  be  limited.  Early  tracing  mechanisms 


10 


could  not  generate  long  enough  traces,  so  shorter  traces  were  stitched  together  [1,  2].  In  other 
cases,  single  process  traces  were  interleaved  to  approximate  a  multiprocess  environment  [56]. 
Recently,  more  robust  methods  have  become  available  so  that  such  artificial  measures  are  not 
required  [13,  20].  Long  traces  are  difficult  to  manage  because  of  the  storage  space  they  require. 
Analysis  can  be  conducted  on  the  fly  so  the  traces  are  used  as  they  are  generated  [8],  or  the 
traces  can  be  sampled  to  reduce  their  length  [3]. 

Platform  Impact  The  operating  system  and  compiler  used  affect  cache  performance.  The  relative 
location  of  a  program’s  instructions  and  data  will  affect  the  amount  of  conflict  since  those 
locations  determine  which  cache  line  each  will  be  mapped  to.  Other  considerations  such  as 
data  alignment,  prefetch/flush  commands,  and  program  scheduling  will  also  affect  the  reference 
stream.  The  compiler  generates  code  optimized  for  a  certain  physical  memory  system,  so 
may  not  be  ideal  for  the  test  memory  systems  being  considered.  For  the  purposes  of  most 
evaluations,  this  effect  is  considered  to  be  equivalent  across  all  designs,  and  can  be  ignored, 
particularly  by  using  the  least  optimized  code  possible  [69]. 

The  memory  system  used  on  the  platform  will  also  affect  the  evaluations  performed  with  it. 
The  size  of  the  memory  can  produce  page  faults  and  other  activities,  which  in  turn  generates 
additional  overhead  references  that  would  not  have  occurred  in  the  modeled  system.  Other 
systems  may  dynamically  schedule  activities  based  on  the  system  state,  which  may  include 
memory  system  performance,  so  ordering  of  events  may  be  subtly  altered. 

In  certain  architectures,  the  scheduling  of  references  is  linked  directly  to  the  memory  system 
performance.  For  instance,  one  possible  method  to  hide  the  cache  latency  is  to  generate  a 
context  switch  on  any  cache  miss.  For  this  to  be  viable,  the  overhead  of  performing  a  context 
switch  must  be  less  than  the  latency  to  service  a  cache  miss.  If  this  is  the  case,  the  cache 
performance  then  plays  a  major  role  in  defining  the  reference  stream.  One  solution  used 
in  [38]  is  to  not  only  simulate  the  cache,  but  the  pipeline  and  instruction  set  as  well.  The 
test  program  executable  file  is  fed  into  the  simulation  which  executes  it  ’’virtually”.  Such  a 
simulation  is  very  comprehensive  but  also  quite  complex.  Parallel  systems  present  a  similar 
problem.  References  may  be  generated  for  one  system  and  a  variety  of  memory  configurations 
can  be  tested,  but  any  changes  to  the  architecture  of  the  underlying  system  may  totally 
invalidate  the  accuracy  of  the  reference  stream.  Also,  multiple  reference  streams  are  being 


11 


generated  simultaneously,  either  being  applied  to  the  same  cache  or  multiple  caches  that  must 
remain  consistent.  Generally,  such  complex  architectures  dictate  certain  types  of  evaluation 
methods,  using  either  synthetic  [46]  or  hardware  monitored  traces  [59]  for  analysis.  Another 
option  is  to  capture  robust  traces  with  more  information  than  just  simple  addresses  so  that 
the  execution  stream  can  be  re-created  for  a  variety  of  systems  [26,  32]. 

Reference  Mapping  When  a  reference  is  applied  to  the  cache,  it  is  mapped  onto  a  cache  line. 
A  simple  hashing  of  the  address  bits  may  be  used,  or  a  more  complex  algorithm,  possibly 
including  other  information  such  as  the  process  identifier  [52].  The  algorithm  can  vary  with 
the  system  and  depending  on  how  addresses  are  collected  it  may  be  relevant.  Depending  on  the 
capture  method,  the  addresses  generated  may  also  be  virtual  or  physical.  Virtual  addresses 
may  be  used  to  model  caches,  however  this  is  a  simplification.  The  actual  memory  system 
must  at  some  point  convert  all  addresses  to  physical  form.  This  conversion  affects  how  lines 
are  mapped  from  memory  to  the  cache,  so  it  is  relevant  to  cache  performance.  Unfortunately, 
converting  to  physical  addresses  is  a  very  complex  task  that  requires  considerably  more  system 
state  information  than  is  provided  by  a  basic  reference  trace.  Since  the  placement  of  programs 
in  memory  affects  their  mapping  into  the  cache,  the  loading  of  programs  into  memory  is  also 
relevant,  although  this  is  usually  controlled  by  the  operating  system. 

There  are  additional  concerns  relevant  to  particular  methods.  If  traces  are  captured,  care 
must  be  taken  so  that  the  act  of  tracing  does  not  affect  the  trace  generated.  Hardware  capture 
methods  tend  to  be  non-intrusive,  but  have  other  drawbacks.  Software  based  methods  in  particular 
are  very  intrusive  since  they  modify  the  test  programs,  and  certain  measures  must  be  taken  to 
compensate  [1,  11,  13,  60]: 

Address  Skewing  The  code  added  to  a  test  program  will  change  the  various  address  used  for  both 
instruction  fetches  and  data  accesses.  If  the  addresses  during  execution  are  used  directly  for 
the  analysis,  the  results  will  be  skewed.  Instead,  the  addresses  must  be  calculated  based  on 
what  the  reference  position  would  have  been  without  tracing.  This  is  normally  handled  by  the 
trace  generation  software,  and  can  be  transparent  to  the  simulation  model. 

Processing  Skewing  The  additional  code  inserted  into  a  program  can  also  cause  the  processing 
characteristics  of  the  test  program  to  be  skewed.  The  added  code  may  make  additional  calls 
to  system  resources  or  generate  additional  interrupts.  The  capture  mechanism  should  ideally 


12 


identify  the  source  of  references  so  they  can  be  discarded  if  not  generated  by  the  original  test 
program,  although  this  is  difficult  when  the  operating  system  is  considered. 

Program  Size  Since  program  size  is  increased,  certain  aspects  of  execution  will  be  changed  such 
as  paging.  The  larger  programs  will  occupy  more  memory  and  hence  require  greater  system 
overhead  to  manage. 

Program  Speed  The  program  speed  is  related  to  the  program’s  size.  The  additional  code  intro¬ 
duced  into  programs  can  easily  slow  down  their  execution  by  an  order  of  magnitude  [8].  The 
more  processing  introduced  by  tracing,  the  greater  the  slow  down  will  be.  This  affects  the 
accuracy  of  traces  in  two  ways.  Longer  programs  will  have  a  disproportionate  number  of  real¬ 
time  interrupts  during  their  execution.  Some  form  of  scaling  must  be  used  so  the  frequency  of 
this  type  of  interrupt  is  reduced  within  the  trace.  Neglecting  to  perform  the  service  routine 
is  possible,  however  may  affect  system  performance.  The  longer  programs  will  also  have  a 
disproportionate  number  of  context  switches  as  the  additional  code  can  both  cause  switches 
as  well  as  slow  down  the  original  program  so  that  less  is  accomplished  during  the  maximum 
execution  interval  allowed  by  the  scheduler. 

Once  such  concerns  are  addressed  for  a  given  evaluation  methodology,  an  analysis  can  be  performed 
with  a  great  deal  of  confidence  in  its  results. 

2.3  Current  Work 

As  early  as  the  late  1980’s,  the  impact  of  the  operating  system  and  additional  processes  was 
recognized  as  a  concern  in  memory  system  performance  [1,  2,  3].  More  recent  work  has  consistently 
validated  the  supposition  that  this  impact  was  significant  enough  to  warrant  further  study,  and 
should  be  included  in  any  comprehensive  memory  system  evaluation  [5,  11,  12,  13,  41,  59].  More 
importantly,  as  computing  capability  increased,  it  has  become  possible  to  capture  longer  and  more 
complete  traces  directly,  without  using  such  patch  work  measures  as  described  before. 

Much  of  the  recent  work  has  revolved  around  trace  driven  simulation  with  software  capture 
methods.  Many  studies  still  consider  cache  performance,  although  others  are  becoming  more  focused, 
looking  at  specific  areas  such  as  the  effect  different  operating  system  structures  can  have  on  memory 
system  performance  [11,  12].  Some  of  the  methods  used  are  either  proprietary  [37],  or  especially 
designed  for  a  certain  application  [62].  Some  generic  tools  have  been  generated,  such  as  Epoxie, 


13 


which  rewrites  assembly  code  to  generate  address  traces  [11,  12,  13]. 

Another  such  tool  is  ATOM,  very  similar  to  those  found  in  [11,  12,  13,  37].  Developed  by 
dec’s  Western  Research  Laboratory,  ATOM  is  a  general  purpose  program  analysis  tool  that  can  be 
customized  to  perform  a  wide  variety  of  different  evaluations.  Until  recently,  ATOM  focused  on  only 
the  single  process  environment,  but  in  its  latest  versions,  it  now  has  the  capability  to  capture  traces 
that  include  the  operating  system  as  well  as  multiple  user  programs.  This  research  has  revolved 
around  refining  this  capability  and  demonstrating  its  applicability  to  cache  analysis. 


14 


3  ATOM  Overview 

3.1  General  Use 


ATOM  (Analysis  Tools  with  OM)  [51]  is  not  a  specific  application;  rather  it  is  a  toolset  that 
can  be  used  to  produce  custom  analysis  tools.  It  provides  the  framework  to  generate  program  traces 
during  execution  and  pass  the  trace  data  to  analysis  routines  through  a  procedure  call  interface. 
The  analysis  or  simulation  program  is  actually  incorporated  into  the  test  program,  so  as  the  test 
program  is  executed,  so  is  the  tool.  This  procedure  is  commonly  referred  to  as  execution  driven 
simulation^  effectively  combining  the  act  of  tracing  and  analysis.  Tracing  of  this  type  alleviates  the 
need  for  trace  storage,  as  well  as  the  difficulties  of  synchronizing  a  separate  analysis  program  with 
the  test  programs. 

The  analysis  performed  can  vary  a  great  deal  due  to  the  flexibility  provided  by  ATOM. 
Tracing  is  performed  on  selected  events  such  as  program  start/stop,  basic  block  boundaries,  memory 
reads  and  writes,  instructions,  or  procedures.  Certain  types  of  a  given  event  can  be  selected  (i.e., 
a  certain  procedure  call),  or  all  instances  of  an  event  (i.e.,  every  instruction).  The  trace  capture 
is  inserted  as  a  function  call  to  an  analysis  routine,  so  that  when  a  particular  event  occurs  during 
execution,  information  about  that  event  is  passed  to  the  analysis  routine  where  the  event  data  is 
recorded,  processed,  or  in  some  other  way  used  to  perform  the  desired  evaluation. 

Given  this  type  of  framework,  tools  are  quite  easy  to  generate.  For  a  simple  cache  simulator 
with  a  single  process,  the  test  program  is  instrumented  at  every  instruction  fetch  and  at  every  data 
load  or  store.  The  memory  location  referenced  by  each  instruction  is  passed  to  the  analysis  routines 
corresponding  to  that  reference  type.  Within  the  analysis  routine,  the  cache  simulation  is  performed, 
so  that  when  the  test  program  concludes,  the  simulation  is  completed. 

The  specific  form  of  analysis  to  be  “instrumented”  into  the  test  program  is  incorporated  at 
link  time  by  ATOM  using  two  files: 

1.  the  instrumentation  file,  which  instructs  ATOM  which  events  to  trace  on  and  what  event 
information  to  pass  to  the  analysis  routines,  and 

2.  the  analysis  file,  which  defines  the  various  analysis  routines  and  any  other  subsidiary  functions 
required. 

It  is  a  very  simple  process  to  use.  The  test  program  is  compiled,  and  then  used  as  input  to 


15 


the  ATOM  program  with  the  following  example  command  line: 

Xatom  program. rr  inst.c  anal.c  -o  program. trace 

The  program  is  then  executed  and  the  desired  analysis  specified  by  inst .  c  and  anal .  c  is  performed. 
This  is  a  very  simple  example.  There  are  various  control  flags  that  ATOM  accepts,  these  are 
described  in  both  the  on-line  documentation  and  the  program  manuals. 

For  simplicity  it  is  also  possible  to  define  tools  for  ATOM.  A  tool  description  file  is  created 
which  specifies  which  instrumentation  and  analysis  files  to  use,  as  well  as  the  various  flags  to  pass 
to  ATOM.  The  programs  are  instrumented  with  a  tool  by  using  the  command  line: 

y,atom  program. rr  -tool  eval  -o  program. trace 

In  addition  to  simplifying  the  command  line,  defining  a  custom  tool  also  allows  additional  control 
flags  to  be  used.  The  basic  ATOM  command  line  does  not  accept  loader  flags,  for  example,  so  the 
flags  necessary  to  include  shared  libraries  such  as  math.h  (-Im)  cannot  be  used.  This  would  normally 
prevent  analysis  routines  from  accessing  such  basic  functions,  which  is  obviously  an  inconvenience. 
By  defining  a  tool,  it  is  also  possible  to  define  additional  flags  and  at  which  stage  of  instrumentation 
they  should  be  used  -  allowing  the  use  of  shared  libraries  and  other  linker /loader  flags. 

With  the  flexibility  provided,  ATOM  is  a  versatile  tool,  but  accuracy  is  still  a  potential 
problem.  Another  strong  point  for  ATOM  is  its  robustness.  In  the  cache  example  above,  one  major 
concern  is  the  fact  that  by  adding  additional  code  to  the  program,  the  reference  stream  becomes 
skewed  by  the  additional  instructions.  This  is  automatically  compensated  for  by  ATOM  during 
instrumentation,  so  that  the  addresses  passed  to  the  analysis  routines  are  those  of  the  memory 
references  without  tracing. 

Another  area  ATOM  excels  in  is  its  care  with  shared  libraries.  Many  simulations  totally 
neglect  shared  libraries,  which  may  be  a  significant  portion  of  the  code  depending  on  the  application. 
Programs  can  be  compiled  with  the  non_shared  option,  or  ATOM  can  instrument  the  shared  libraries 
as  well.  To  be  even  more  exact,  an  instrumented  and  non-instrumented  copy  of  the  shared  library 
routines  are  produced.  This  way  if  the  instrumented  program  calls  a  shared  library,  the  instrumented 
version  of  the  library  is  used.  If  the  analysis  routine  calls  the  same  library  function,  the  non- 
instrumented  version  is  used  so  that  the  analysis  is  not  corrupted. 

Until  recently,  ATOM  was  not  capable  of  tracing  the  operating  system,  and  was  not  partic- 


16 


ularly  suitable  for  tracing  multiple  test  programs.  The  latest  version  of  ATOM,  however,  does  allow 
instrumentation  of  the  operating  system.  The  initial  tests  of  this  facility  were  performed  by  Eustace 
and  Chen  in  [20],  but  some  aspects  were  not  particularly  well  addressed.  The  primary  focus  of  this 
research  has  been  to  further  test  and  build  on  their  work  [24]. 

3.2  Operating  System  Implementation 

With  the  latest  version  of  ATOM,  it  is  now  possible  to  instrument  and  study  the  operating 
system,  specifically  the  OSF  kernel.  It  is  treated  much  as  any  program  would  be,  albeit  a  very  large 
and  complex  one.  Because  of  the  unique  nature  of  the  operating  system,  there  are  certain  measures 
which  must  be  taken  that  are  not  required  for  a  normal  program.  Part  of  the  mechanism  used  to 
study  the  kernel  is  also  used  to  capture  traces  with  multiple  user  processes  as  well. 

3.2.1  Set  Up 

To  use  ATOM  with  the  operating  system,  some  modifications  are  usually  required  to  the 
test  platform.  More  memory  may  be  needed  to  execute  the  larger  programs,  128MB  is  recommended 
by  DEC.  The  larger  programs  will  also  require  more  swap  space  (256MB  recommended),  a  larger 
user  file  space,  and  an  expanded  root  partition  (up  to  60MB  depending  on  the  application).  ATOM 
version  2.20  or  later  must  be  installed,  with  the  WRL  enhancement  kit.  Both  are  available  from 
DEC  via  anonymous  FTP. 

Changes  are  necessary  to  allow  the  kernel  to  be  instrumented.  The  makefile,  normally  in 
the  /usr/sys  directory,  must  be  modified  and  the  kernel  remade.  The  two  modifications  required 
are: 

1.  The  LDFLAG  line  must  have  the  -ncr  flag  removed.  This  flag  removes  the  compact  relocation 
records,  and  is  not  compatible  with  ATOM. 

2.  The  ALPEA^TEXTBASE  must  be  increased  to  account  for  the  larger  kernel  size.  This  value 
represents  the  amount  of  space  in  memory  allocated  for  the  kernel  text,  usually  set  at  h230000. 
Instrumentation  increases  the  size  of  the  kernel  so  this  value  must  be  increased  accordingly. 
The  required  increase  will  vary,  so  occasionally  the  kernel  must  be  generated  twice.  First  a 
rough  estimate  of  the  necessary  increase  is  used  to  make  a  kernel  which  is  instrumented.  The 
nm  -B  command  can  then  be  used  to  calculate  the  actual  value  needed.  If  it  is  too  small,  the 


17 


kernel  will  crash,  and  if  it  is  too  large,  memory  may  be  wasted.  For  the  work  performed  here, 
a  value  of  h2C00000  was  used. 


Once  the  makefile  has  been  altered,  a  new  kernel  is  created  by  the  sequence  of  commands: 

#niake  clean 
#make  depend 
#make 

These  commands  must  be  executed  as  root;  using  the  sudo  utility  is  not  possible  as  the  kernel  will 
not  be  made  correctly.  During  testing  it  was  useful  to  have  multiple  kernels  available  with  different 
ALPHA_TEXTBASE  values  as  needs  changed.  If  multiple  kernels  are  made,  it  is  necessary  to  rename 
the  existing  kernels  before  a  new  one  is  created  as  all  existing  files  of  the  form  vmunix* .  *  are  erased 
during  the  make  process.  The  new  kernels  are  then  instrument  able  as  any  other  program. 

3.2.2  Programming 

The  act  of  instrumentation  inserts  function  calls  into  the  test  program.  These  functions  are 
executed  as  each  event  is  reached  during  program  execution,  performing  the  desired  analysis.  For  a 
cache  simulator,  those  events  are  instruction  fetches,  data  reads,  and  data  writes.  At  each  memory 
reference,  the  address  referenced  is  passed  to  the  analysis  function  for  processing  in  the  cache  model. 
Additional  functions  are  used  at  program  start  and  end  to  initialize  the  simulation  parameters  and 
report  the  simulations  results.  The  various  functions  and  the  instrumentation  are  defined  in  the  two 
ATOM  files  mentioned  previously  for  both  the  kernel  and  test  programs. 

To  incorporate  the  operating  system  into  the  analysis,  it  is  necessary  for  the  operating  system 
and  test  program  to  share  data.  The  cache  state  must  be  accessible  to  both  programs,  as  well  as 
other  counters  and  synchronization  flags.  This  sharing  can  be  accomplished  via  the  /dev/kmem  or 
/dev/mmap  utilities.  The  shared  data  is  local  to  the  kernel.  When  the  test  program  begins,  either 
of  the  utilities  is  used  to  map  the  shared  data  into  the  test  program’s  address  space,  where  it  can 
be  accessed  via  a  pointer.  Now  the  two  processes  have  a  common  data  structure  that  is  the  core  of 
the  simulation.  To  use  these  utilities,  there  are  two  requirements.  First,  the  test  programs  must  be 
run  as  root  to  access  the  /dev/  files.  Second,  two  copies  of  the  kernel  must  be  created.  One  is  the 
executable  which  is  actually  loaded,  the  other  is  a  debug  version  which  contains  the  symbol  table 
information  necessary  to  perform  the  mapping.  The  debug  version  stays  in  the  same  directory  as 


18 


the  test  programs. 

The  ability  to  share  data  is  the  also  key  to  capturing  traces  from  multiple  processes.  As 
described  above,  data  is  captured  from  two  processes,  the  kernel  and  the  user  program.  As  will 
be  seen,  the  same  technique  can  be  used  to  increase  the  number  of  processes  being  captured.  The 
example  above  uses  shared  cache  state  data,  but  any  set  of  data  may  be  shared  to  provide  the  desired 
capture  information. 

The  instrumentation  and  analysis  files  are  not  substantially  different  for  the  kernel  and  user 
programs.  For  the  kernel,  a  test  must  be  used  to  ensure  that  certain  procedures  are  not  instrumented 
(see  below).  For  the  test  program,  the  shared  data  must  be  mapped  at  program  start  and  the  data 
recorded  at  program  end.  Otherwise,  the  analysis  functions  may  be  more  or  less  the  same.  For  the 
cache  simulator,  a  process  identification  value  is  passed  with  the  address  so  that  the  sending  process 
is  recognizable. 

Figure  1  shows  logically  how  the  original  code  and  analysis  routines  work  together  to  perform 
the  desired  analysis,  in  this  case  the  cache  simulator. 


Figure  1:  Program  Block  Diagram 


19 


3.2.3  Execution 


Once  the  required  files  are  written,  the  implementation  is  not  substantially  different  from 
that  of  any  other  test  program.  The  two  instrumented  versions  of  the  kernel  are  produced  with  two 
slightly  different  command  lines.  For  the  executable: 

'/•atom  vmunix  kern.inst.c  kern. anal. c  -Xkernel  -Xgprog  ~o  vmunix. trace 
and  for  the  debug  version: 

•/.atom  vmunix  kem.inst.c  kem.anal.c  -Xkernel  -g  -o  vmunix. debug 

The  various  test  programs  are  also  instrumented  as  described  above.  The  executable  version 
of  the  kernel  is  moved  to  root,  and  the  system  is  restarted  with  the  #shutdown  -h  now  command. 
Using  boot  -f  1  i,  the  system  is  restarted  and  the  instrumented  kernel  is  specified  and  loaded.  The 
testbed  is  frequently  shutdown,  so  it  was  helpful  to  have  a  dedicated  system  for  this  research  so  that 
other  work  was  not  interrupted.  Once  the  kernel  is  running  at  the  desired  execution  level,  the  test 
programs  are  then  executed  normally,  performing  the  analysis.  It  is  recommended  that  a  batch  file 
be  used  to  run  test  programs  to  simplify  testing. 

3.3  Problem  Areas 

3.3.1  ATOM  Limitations 

Certain  characteristics  of  ATOM  define  limitations  on  the  instrumentation  which  can  be 
used  within  the  Unix  kernel. 

•  Since  it  is  the  operating  system,  tracing  cannot  be  based  on  the  program  end  event. 

•  Certain  kernel  procedures  cannot  be  instrumented.  These  are  the  locore,  lockprim,  and 
spl  libraries,  which  account  for  only  132  out  of  10,678  kernel  procedures  so  the  error  induced 
should  be  negligible. 

•  Floating  point  numbers  cannot  be  used  within  the  kernel. 

•  The  ATOM  model  used  when  simulating  dynamic  memory  allocation  is  not  accurate  within 
the  kernel,  so  analysis  of  this  aspect  of  program  execution  is  suspect. 

•  No  system  call  interfaces  can  be  used  within  the  kernel. 


20 


Most  of  these  limitations  are  not  particularly  significant,  although  the  last  is  inconvenient.  Without 
system  calls,  file  10  is  not  possible,  which  precludes  using  a  file  to  set  evaluation  parameters.  This 
makes  it  very  difficult  to  dynamically  define  analysis  parameters,  so  in  many  cases  the  programs  and 
operating  system  must  be  re-instrumented  for  each  desired  evaluation  (i.e.  a  separate  run  for  each 
cache  configuration).  Many  other  shared  library  routines,  such  as  mathematical  functions,  are  also 
unavailable.  As  future  versions  of  ATOM  are  released,  hopefully  some  of  these  shortcomings  will  be 
addressed. 

3.3.2  Kernel  Limitations 

Working  with  the  kernel  also  entails  certain  problems,  especially  for  a  programmer  unfamiliar 
with  the  operating  system  environment.  The  kernel  is  difficult  to  manipulate,  requiring  special  access 
privileges.  The  critical  nature  of  the  program  requires  careful  handling,  although  based  on  previous 
work,  instrumentation  errors  will  not  damage  the  system  —  a  kernel  improperly  instrumented  will 
usually  not  even  boot.  The  primary  difficulty  of  working  with  an  operating  system  is  the  difficulty 
in  debugging.  Most  debugging  tools  cannot  be  used  to  debug  a  kernel,  and  many  of  the  error 
messages  generated  are  cryptic.  Initial  testing  of  instrumentation  code  should  be  done  on  generic 
user  programs,  and  only  when  working  on  that  level  should  it  be  attempted  on  the  kernel.  This 
provides  better  checking,  and  a  much  faster  debug  and  test  cycle.  Working  with  the  kernel  is  a  slow 
process.  Making  a  new  kernel  takes  up  to  8  minutes,  and  each  instrumentation  can  take  as  much, 
if  not  more,  time.  Even  eissuming  a  new  kernel  is  not  required,  to  test  a  kernel  usually  takes  about 
20-30  minutes  (as  compared  to  the  almost  instantaneous  results  from  a  simple  user  program).  Even 
with  debugging  on  a  user  program,  many  problems  will  only  appear  in  the  kernel,  so  in  general, 
development  is  very  slow.  Some  of  this  may  have  been  due  to  system  limitations,  but  only  a  minor 
improvement  should  be  expected  with  better  resources. 

There  were  three  obscure  errors  found  regularly  during  kernel  testing: 

1.  KSP  INVAL 

2.  bootstrap  address  collision:  image  loading  aborted 

3.  trap:  invalid  memory  access  from  kernel  mode 

The  first  error  can  occur  when  the  kernel  is  loaded  or  during  execution.  This  is  roughly  equivalent 
to  a  segmentation  violation  which  is  normally  caused  by  a  misuse  of  pointers.  This  error  may 


21 


also  be  caused  by  running  out  of  memory,  if  there  is  not  enough  stack  or  heap  for  the  kernel  to 
execute.  The  second  message  always  appears  during  kernel  loading.  This  is  caused  by  an  incorrect 
ALPHA_TEXTBASE  assigned  in  the  makefile.  The  nm  -B  command  should  be  used  to  determine  the 
correct  value  and  the  kernel  remade.  The  final  error  always  occurs  during  test  program  execution. 
This  was  an  intermittent  error  and  the  cause  was  never  found,  even  after  conferring  with  DEC. 
The  error  always  occurred  in  the  kernePs  thread_preempt  routine  which  suggests  it  is  related  to 
interrupts  and/or  context  switching.  The  error  was  linked  to  the  size  of  the  test  programs  being 
executed.  A  single  large  program  could  cause  the  error  (such  as  Xlisp),  or  combinations  of  smaller 
programs  (such  as  Alvinn  with  any  other  program,  or  Compress,  GCC,  and  Espresso  all  together). 
Since  it  occurred  with  only  one  test  program  running,  it  cannot  be  caused  by  having  two  or  more 
test  programs  sharing  the  kernePs  data  structure.  The  memory  of  the  testbed  was  increased  from 
64  to  160MB  with  no  effect.  The  hardclock  scaling  (see  below)  was  reduced  to  its  minimum  value  of 
50%  with  no  effect.  To  isolate  the  problem  it  will  be  necessary  to  complete  an  examination  of  the 
kernel  which  is  beyond  the  scope  of  this  work.  The  most  likely  cause  is  the  threaded  execution  of 
the  kernel  and  the  lack  of  firm  control  within  the  analysis  routines;  although  it  is  possible  that  the 
hardclock  scaling  is  the  culprit. 

3.3.3  Program  Size 

One  common  problem  with  any  software-based  tracing  method  is  the  increase  in  program 
size.  Since  the  program  is  instrumented  with  not  only  tracing  information,  but  also  analysis  func¬ 
tions,  this  is  a  greater  concern  when  ATOM  is  used.  The  normal  OSF  kernel  is  about  8-9MB.  If 
the  same  kernel  is  instrumented  with  a  function  call  at  every  instruction,  and  an  additional  call 
at  every  data  read  or  write,  the  kernel  will  grow  to  92.7MB  and  require  an  ALPHA_TEXTBASE  of 
about  hSAOOOOO.  A  kernel  this  size  could  not  even  be  loaded  on  the  test  machine.  By  instrument¬ 
ing  groups  of  instructions  (and  still  each  data  reference),  the  kernel  is  only  about  46MB  with  an 
ALPHA_TEXTBASE  of  h2C00000,  which  is  executable.  Instrumenting  just  instruction  or  data  accesses 
will  reduce  the  size  by  about  half.  It  is  important  to  note  that  the  size  of  the  instrumented  kernel  is 
primarily  a  function  of  the  degree  of  instrumentation,  not  analysis.  Changing  the  amount  of  analysis 
processing  only  varied  the  size  of  the  kernel  by  about  4MB. 

Besides  the  strain  on  the  system  from  working  with  such  a  large  kernel,  it  also  raises  an 
accuracy  issue.  The  kernel  used  in  our  tests  left  only  15MB  of  memory  available  for  test  programs. 


22 


yet  this  is  supposed  to  be  simulating  a  system  with  about  50MB  of  free  memory.  The  situation  is 
even  worse  when  the  fact  that  each  test  program  is  also  instrumented  and  significantly  larger  than 
normal  is  considered.  Such  large  programs  require  more  paging,  which  in  turn  skews  the  amount 
of  overhead  each  program  requires.  For  more  accurate  results,  the  amount  of  memory  should  be 
increased  proportionately. 

3.3.4  Execution  Speed 

Execution  speed  becomes  critical  when  considering  the  instrumented  kernel.  The  inclusion 
of  tracing  can  reduce  the  execution  speed  of  a  program  by  an  order  of  magnitude  [8] ,  more  so  with 
the  additional  processing.  A  slowdown  of  this  magnitude  may  not  be  tolerated  by  the  operating 
system.  At  some  point,  the  kernel  becomes  so  slow  that  it  cannot  function  correctly.  Interrupts  and 
service  requests  may  be  generated  feister  than  they  can  be  serviced,  effectively  hanging  the  system 
during  boot  up.  This  can  also  be  seen  during  test  program  execution  if  too  many  processes  are 
executed  —  the  kernel  simply  thrashes  and  the  system  stalls.  Even  assuming  the  operating  system 
does  work,  basic  tasks  can  take  an  inordinate  amount  of  time.  Booting  a  kernel  with  a  basic  cache 
simulator  in  multi-user  mode  and  logging  on  took  over  an  hour  in  one  test.  Several  methods  have 
been  explored  to  accelerate  the  kernel  and  counter  this  problem. 

The  first  is  to  use  a  different  programming  style  for  the  kernel  analysis  routines.  Only 
the  bare  minimum  code  necessary  to  perform  the  desired  task  is  used.  No  additional  function 
calls  are  made  beyond  the  initial  call  to  the  analysis  routine,  eliminating  extra  switching.  Any 
additional  computation  is  incorporated  into  the  primary  function,  even  if  this  requires  duplicating 
code.  Loops  should  be  used  sparingly  and  the  iterations  minimized,  and  any’  other  time  consuming 
operations  should  be  optimized.  Minimizing  data  storage  may  help,  but  is  not  a  primary  factor. 
These  techniques  will  definitely  speed  execution,  particularly  eliminating  function  calls,  so  even 
though  some  of  these  changes  introduce  poor  programming  practice  from  a  software  engineering 
standpoint,  they  need  to  be  used. 

If  the  kernel  boots,  but  is  too  slow  to  execute  the  test  programs  in  a  multi-user  environ¬ 
ment,  the  first  solution  is  to  reduce  the  number  of  additional  processes  the  kernel  may  be  executing. 
Programs  being  run  by  other  users  or  not  part  of  the  test  should  be  eliminated.  Other  background 
processes  associated  with  the  operating  system  can  also  be  killed.  In  multi-user  mode,  there  are  ad¬ 
ditional  background  processes  executing,  such  as  LAT,  cron,  network  software,  and  printer  daemons. 


23 


Many  of  these  are  not  necessary  for  the  tests  and  can  be  removed  —  the  fewer  processes  running 
the  faster  the  kernel  will  be. 

If  the  kernel  is  still  to  slow,  or  will  not  boot  in  multi-user  mode,  it  is  possible  to  run  the 
programs  in  single  user  mode.  This  effectively  eliminates  all  extraneous  processes  and  dedicates 
the  system  to  the  instrumented  test  programs.  When  the  system  boots  to  the  first  #  prompt,  do 
not  start  the  higher  execution  level  (the  command  is  ""D).  The  local  disks  can  be  mounted  using 
#mouiit  -at  uf  s  so  that  the  test  programs  can  be  accessed  (assuming  they  are  on  a  local  disk).  The 
simulations  can  then  be  executed  normally.  If  multiple  test  programs  are  desired,  they  can  be  run 
concurrently  by  using  background  mode  (&)  for  each.  Using  single  user  mode  is  significantly  faster, 
and  can  be  considered  an  advantage  or  disadvantage.  It  is  true  that  most  of  the  processes  that 
would  be  executing  in  a  “real”  environment  are  absent,  lessening  the  accuracy,  however  it  also  lets 
the  analysis  focus  on  the  operating  system  overhead  associated  with  a  particular  program  without 
all  the  other  extraneous  references.  The  use  of  single  user  mode  will  depend  on  both  the  constraints 
of  the  kernel  and  the  desired  evaluation.  Single  user  mode  may  also  limit  the  choice  of  test  programs. 
Some  programs,  such  as  SC  in  the  SPEC  benchmark  suite,  require  specific  interfaces  which  may  not 
be  available  and  so  cannot  be  executed. 

If  the  kernel  is  so  slow  that  it  cannot  even  be  booted,  it  may  be  necessary  to  disregard  some 
of  the  real-time  interrupts  that  are  stalling  the  system.  The  main  interrupt  of  concern  is  the  system 
call  to  the  hardclock.  The  number  of  the  hardclock  calls  which  are  performed  can  be  scaled  by  using 
assembly  code  [10].  This  allows  a  certain  percentage  of  the  interrupts  to  be  ignored.  This  has  by 
far  the  most  significant  impact  on  kernel  speed,  and  should  be  sufficient  to  allow  most  programs  to 
execute. 

The  speed  factor  also  raises  a  question  of  accuracy.  Any  event  that  is  based  on  an  absolute 
timing  mechanism  (such  as  real  time  interrupts)  will  not  be  affected  by  instrumentation.  That 
means  that  as  an  instrumented  program  executes,  it  sees  a  disproportionate  number  of  these  events 
during  its  execution.  The  hardclock  scaling  mentioned  above  will  partially  resolve  this  issue,  but  it 
has  not  been  fully  verified.  Another  accuracy  factor  is  the  number  of  context  switches.  If  a  system 
uses  a  maximum  execution  interval,  the  frequency  of  context  switches  seen  by  an  instrumented 
test  program  will  also  be  out  of  proportion.  One  measure  used  in  [8]  is  to  increase  the  maximum 
execution  interval  defined  by  the  task  scheduler. 


24 


3.3.5  Re-entrance 


One  of  the  most  complex,  and  possibly  significant,  aspects  of  working  with  the  kernel  is  its 
multi-threaded  nature.  System  calls,  interrupt  service  routines,  and  other  overhead  functions  are 
all  separate  processes  to  be  executed  by  the  processor.  They  may  be  executed  at  any  time  during 
program  or  analysis  execution.  This  causes  a  problem  of  guaranteeing  the  integrity  of  the  analysis 
data.  For  example,  during  execution  of  the  test  program,  the  analysis  routine  is  called.  While 
the  analysis  routine  is  still  processing  that  particular  event,  an  interrupt  occurs.  The  interrupt 
will  supersede  the  analysis  routine  and  the  interrupt  service  routine  will  be  executed.  The  service 
routine  is  part  of  the  kernel,  and  is  also  instrumented.  Therefore,  as  the  service  routine  executes, 
it  also  generates  events  and  calls  to  the  analysis  routines,  before  the  prior  analysis  routine  call  has 
completed.  Since  all  analysis  routines  access  a  common  data  structure,  the  actual  state  of  the  data 
becomes  non-determinate  and  the  evaluation  results  inaccurate.  Consider  an  analysis  routine  which 
is  interrupted  in  the  middle  of  incrementing  a  counter.  The  counter  is  loaded  and  incremented,  but 
has  yet  to  be  stored.  The  second  execution  of  the  analysis  routine  also  increments  the  counter,  so 
it  loads,  increments,  and  stores  the  data.  The  problem  is,  the  value  the  second  routine  loaded  was 
incorrect,  since  the  first  routine  never  had  a  chance  to  store  the  new  value  of  the  counter.  When  the 
first  routine  does  return  to  execution,  it  then  writes  the  value  of  the  counter,  which  eliminates  any 
changes  to  the  counter  that  occurred  during  the  interruption.  Analysis  functions  must  be  designed 
explicitly  to  handle  such  concerns,  called  re-entrant,  since  they  can  effectively  be  ‘‘entered”  multiple 
times  without  loss  of  integrity. 

Further  data  thrashing  is  possible  during  a  context  switch.  At  a  context  switch,  the  current 
state  of  the  processor  is  saved  so  that  when  that  process  returns  to  execution,  it  is  started  from 
the  point  where  it  was  swapped  out.  This  current  status  is  usually  represented  by  data  such  as  the 
registers  and  allocation  tables.  In  a  threaded  program,  however,  there  may  be  data  that  is  visible  to 
all  processes  and  not  stored  at  the  context  switch.  If  this  data  is  relevant  to  the  state  of  a  particular 
process,  it  must  be  explicitly  defined  as  such.  For  instance,  one  process  sets  a  variable  in  the  global 
data.  This  data  is  carried  over  a  context  switch  and  is  now  visible  to  the  next  process,  where  it  may 
or  may  not  affect  its  execution.  If  the  communication  is  intentional,  care  must  be  used  so  that  a 
context  switch  performed  in  the  act  of  setting  the  variable  will  not  disrupt  the  execution.  For  this 
reason,  the  scope  of  data  should  be  kept  as  local  as  possible,  and  any  global  data  must  be  protected. 


25 


Re-entrance  is  normally  achieved  through  synchronization.  Each  time  a  particular  function 
is  entered,  it  must  determine  if  it  is  unique  or  if  there  are  other  instances  of  that  function  in 
mid  execution.  This  is  accomplished  by  a  semaphore  or  other  form  of  signal  which  is  visible  to 
all  instances  of  every  function.  Such  global  data  can  be  used  to  coordinate  the  activities  of  each 
function,  the  actual  implementation  depending  on  the  desired  effect.  For  the  synchronization  to  be 
effective,  it  must  be  an  atomic  operation.  The  two  acts  of  checking  the  semaphore  and  setting  it  if 
it  is  not  already  set  cannot  be  interrupted,  otherwise  synchronization  may  be  lost.  For  example,  a 
process  checks  the  signal  and  determines  that  it  is  the  first  instance  of  that  analysis  function.  Before 
it  can  set  the  signal,  however,  an  interrupt  occurs  and  the  function  called  again.  This  instance  also 
checks  the  signal  and  determines  that  it  is  the  first,  conflicting  with  the  legitimate  first  instance. 
Normal  instructions  do  not  provide  this  capability,  as  an  interrupt  may  quite  easily  occur  between 
testing  and  changing  a  variable.  Instead,  particular  commands  must  be  used,  which  will  depend  on 
the  platform  used. 

The  task  of  making  analysis  routines  re-entrant  is  further  complicated  by  the  fact  that  the 
analysis  routines  are  being  executed  within  the  kernel.  There  are  many  libraries  of  thread  control 
and  synchronization  routines  such  as  pthxeads.h,  semaphore.h,  signal. h,  and  others,  but  these 
are  mostly  services  provided  hy  the  kernel,  not  available  within  the  kernel.  To  make  the  analysis 
routines  fully  re-entrant,  it  will  be  necessary  to  incorporate  the  same  synchronization  used  within 
the  kernel,  which  is  not  well  documented. 

In  some  cases  the  error  introduced  by  data  corruption  is  small  enough  that  it  can  be  toler¬ 
ated.  In  other  cases,  contrived  re-entrance  can  be  incorporated  with  basic  programming  to  insure 
some  protection.  For  a  detailed  analysis  of  a  multithreaded  program  such  as  the  operating  system, 
however,  full  re-entrance  will  be  required.  This  problem  has  not  been  addressed  before,  and  will 
require  substantial  investigation  before  it  is  adequately  resolved, 

3.3.6  Reference  Stream  Accuracy 

The  threaded  nature  of  the  operating  system  also  raises  accuracy  concerns.  Through  testing, 
it  has  been  determined  that  there  is  no  duplication  of  kernel  software  similar  to  that  used  for  shared 
libraries  in  single  process  simulation.  This  means  that  if  the  analysis  routine  in  the  test  program 
makes  a  system  call  or  instigates  an  interrupt,  then  the  instrumented  kernel  service  routine  is 
executed.  This  in  turn  generates  additional  references  for  the  simulation  which  would  not  have  been 


26 


generated  in  the  untraced  version  of  the  program.  This  is  a  significant  concern,  particularly  if  the 
execution  of  the  operating  system  is  to  be  analyzed  in  detail.  Since  all  real-time  interrupt  routines 
are  instrumented,  they  generate  additional  references  as  well  since  there  is  proportionately  more 
interrupts  per  program  execution  time.  To  counter  this,  there  must  be  an  explicit  mechanism  to 
determine  the  cause  of  the  operating  system  references  and  disregard  the  additional  references  — 
possibly  something  to  incorporate  as  an  aspect  of  the  re-entrance  mechanism. 

3.3.7  Portability 

The  final  area  of  concern  is  ATOM’s  portability.  One  criticism  of  many  of  the  past  methods 
was  their  lack  of  portability.  Some  are  custom  tools,  and  many  were  tied  to  a  specific  architecture  or 
program.  It  is  unfortunate  that  ATOM  is  no  exception.  ATOM  has  only  been  implemented  for  the 
DEC  Alpha  workstations  and  the  operating  system  aspect  can  only  be  used  with  DEC  OSF/1.  The 
one  advantage  ATOM  does  have  is  its  flexibility.  Since  it  is  a  generic  framework  based  on  software, 
that  framework  can  be  reconstructed  for  other  platforms  or  operating  systems.  The  tools  already 
created  can  then  be  used  to  compare  results  across  systems.  Because  of  this  it  is  hoped  that  one 
day  ATOM  will  be  available  for  other  systems,  which  is  entirely  possible. 


27 


4  Test  Methodology 

4.1  Cache  Model 


Fundamentally,  a  cache  is  simply  a  device  used  to  store  subsets  of  a  large  data  pool  for 
quick  access.  This  type  of  structure  may  be  found  in  a  TLB  [49],  memory  mapping  tables  [52],  or 
within  an  instruction  pipeline  [27].  The  most  common  form,  and  that  which  is  modeled  here,  is  a 
memory  cache  used  to  improve  average  memory  access  times  by  storing  data  mapped  in  from  main 
memory.  The  design  and  execution  of  such  caches  have  been  rigorously  studied,  and  are  described 
in  a  variety  of  sources  [22,  36,  52]. 

The  goal  for  this  research  was  to  develop  a  flexible  cache  simulator  that  incorporates  ref¬ 
erence  streams  from  multiple  processes,  including  the  operating  system.  This  was  built  on  the 
framework  outlined  in  the  previous  section,  using  a  common  data  structure  in  the  kernel’s  address 
space  to  provide  synchronization  and  store  the  cache  state.  The  test  program  mapped  this  struc¬ 
ture  into  the  program’s  address  space  by  accessing  the  /dev/mem  facility,  so  all  test  programs  must 
be  executed  as  root  (moot  point  in  single  user  mode).  To  perform  a  single  process  simulation  for 
comparison,  the  code  was  slightly  modified  so  that  the  cache  data  was  local  to  the  test  program, 
external  communication  and  synchronization  were  no  longer  necessary.  The  code  used  is  provided 
in  appendix  A,  but  a  summary  of  the  most  significant  characteristics  is  provided  below. 

The  default  ATOM  tools  only  incorporate  one  test  program  and  the  operating  system.  By 
using  the  same  technique,  however,  it  is  possible  to  extend  a  simulation  to  an  arbitrary  number  of 
programs.  Each  program  simply  maps  the  same  kernel  data  structure  into  its  space  via  a  pointer  so 
each  process  now  has  access  to  the  same  common  memory  structure.  In  this  way,  simulations  can 
be  conducted  with  multiple  test  programs  with  the  operating  system. 

For  simplicity,  the  various  analysis  files  were  implemented  as  custom  ATOM  tools.  This 
allowed  the  use  of  shared  library  functions  such  as  math.h  within  the  analysis  functions,  as  well  as 
simplified  the  act  of  instrumenting  each  test  program.  The  tools  defined  for  this  research  are: 

kexe  This  specified  the  kernel  instrumentation  and  analysis  programs  with  the  ATOM  flags  neces¬ 
sary  to  produce  an  executable  version  of  the  kernel. 

kdbg  Kdbg  also  specified  the  kernel  instrumentation  and  analysis  programs,  but  with  the  ATOM 
flags  required  to  produce  the  debug  version  of  the  kernel  used  to  map  memory  addresses. 


28 


user#  The  final  tool  was  used  for  the  test  programs.  The  #  symbol  represents  a  digit,  1,  2,  or  3, 
which  identifies  which  test  program  is  being  instrumented.  The  only  difference  is  the  process 
identification  number  assigned. 

The  program  captures  both  instruction  and  data  references  to  be  able  to  model  both  split 
and  unified  instruction  and  data  caches.  This  is  relatively  simple  for  a  RISC  architecture;  each 
instruction  generates  one  instruction  reference,  and  all  data  references  are  one  of  two  possibilities,  a 
data  load  or  data  store.  Instrumenting  every  instruction  generates  too  large  a  kernel  to  be  executed 
on  our  system.  Instead,  instructions  are  instrumented  within  basic  blocks  in  groups  of  8  or  less. 
This  both  decreases  the  size  of  the  programs,  and  speeds  their  execution.  The  processing  routine  is 
passed  the  initial  address  and  the  number  of  instructions  that  follow  to  simplify  processing.  With 
this  information,  the  addresses  of  each  instruction  can  be  recreated  and  processed.  It  is  also  possible 
to  only  instrument  each  bcisic  block,  but  grouping  instructions  presents  a  problem.  To  simulate  a 
unified  cache,  the  interleaving  of  instruction  and  data  references  in  the  same  stream  is  required. 
If  instructions  are  instrumented  in  groups,  the  actual  interleaving  cannot  be  reconstructed.  Data 
references  could  be  out  of  place  by  as  many  references  as  the  number  of  instructions  grouped  together. 
For  this  reason,  instructions  should  be  instrumented  individually  if  possible.  Using  smaller  blocks  of 
instructions  minimizes  this  error,  and  also  allows  another  simplification  in  processing.  If  the  groups 
of  instructions  are  smaller  than  the  cache  block  size,  then  only  one  reference  need  be  processed  for 
the  entire  group  and  the  reference  counter  incremented  by  the  group  size.  A  small  margin  or  error 
is  introduced  because  of  the  assumption  that  instructions  are  aligned  along  blocks,  but  this  will  be 
minimal  as  block  size  increases.  This  was  used  in  the  simulator,  limiting  the  minimum  cache  block 
size  to  32  bytes  given  a  4  byte  instruction. 

Each  reference  is  applied  to  its  appropriate  cache  according  to  the  cache’s  characteristics. 
The  caches  themselves  are  defined  by  4  or  7  parameters,  depending  on  cache  type: 

Type  Either  split,  containing  separate  instruction  and  data  caches  (type  =  1),  or  unified,  having  a 
single  cache  for  both  types  of  references  (type  =  0). 

Cache  Size  The  cache  size  in  number  of  bytes.  The  size  is  specified  as  an  area,  so  that  the  number 
of  cache  lines  in  a  given  cache  is  determined  by: 

cache  size 

block  size  *  associativity 


29 


Cache  size  is  specified  independently  for  each  section  of  a  split  cache,  as  are  the  last  two 
parameters. 


Block  size  The  size  in  bytes  of  a  cache  block,  which  is  the  unit  of  transfer  between  the  cache  and 
memory. 

Associativity  The  number  of  blocks  per  cache  line. 

For  most  simulations  of  this  type,  such  parameters  must  be  staticaly  defined  during  compilation, 
which  makes  repeated  tests  with  a  range  of  parameters  difficult.  This  is  because  the  kernel  cannot 
access  file  10  so  simulation  data  cannot  be  loaded  when  the  program  starts.  This  program  instead 
defines  maximum  parameters  during  compilation  and  memory  is  allocated  for  a  worst  case  condition. 
When  the  operating  system  is  started,  the  simulation  also  starts  but  with  a  flag  so  that  all  references 
are  discarded.  When  the  first  test  program  is  executed,  it  loads  the  desired  cache  parameters  from 
a  file  and  stores  them  into  the  cache  structure,  thereby  allowing  dynamic  definition  of  simulation 
parameters.  Once  this  is  completed,  reference  capture  is  enabled  and  the  simulation  commences. 
This  also  speeds  up  the  operating  system  when  a  simulation  is  not  actually  being  performed,  since 
after  all  test  programs  have  completed  the  flag  is  restored  and  the  simulation  portion  disabled. 

Other  cache  characteristics  are  constant.  These  are  programmed  into  the  simulation  and 
cannot  be  modified  without  code  changes: 

•  The  various  threads  encompassing  the  kernel  are  treated  collectively  as  a  single  process. 

•  Caches  are  virtually  addressed.  A  process  identifier  is  associated  with  each  cache  block  to 
identify  its  owning  process,  so  cache  flushes  on  context  switches  are  not  necessary.  This 
neglects  aliases,  or  multiple  virtual  addresses  to  the  same  physical  location,  but  the  effect  of 
such  shared  data  should  be  minimal  given  the  test  programs  used.  If  multiple  threads  of  a 
single  process  such  as  the  kernel  are  to  be  considered,  however,  this  cannot  be  ignored.  Using 
virtual  addresses  drastically  simplifies  the  simulation,  since  no  translation  to  physical  addresses 
is  necessary,  but  it  does  have  a  drawback.  The  virtual  addresses  for  a  program  will  depend 
on  the  system  executing  it  and  how  it  has  been  mapped  from  memory.  This  mapping  may  be 
optimized  for  a  particular  memory  system  or  the  current  execution  environment,  and  so  skew 
the  results  of  a  simulation  of  a  different  system  on  the  same  addresses.  This  must  be  accepted 
unless  the  virtual/physical  mapping  is  also  considered  in  the  model,  which  is  not  a  simple  task. 


30 


Since  the  effect  will  be  consistent  across  all  programs  and  caches  in  the  simulation,  its  impact 
is  ignored. 

•  No  prefetching  (also  called  demand  fetching)  is  incorporated  into  the  simulation.  This  is 
not  particularly  realistic,  since  pre-fetching  is  a  simple  but  powerful  enhancement  to  cache 
performance,  but  for  an  initial  test  of  the  simulation  capability,  it  becomes  an  unnecessary 
complication. 

•  All  references  are  cissumed  to  be  the  same  size,  accessing  a  single  byte.  This  is  acceptable 
assuming  that  any  words  addressed  do  not  cross  cache  block  boundaries. 

•  Mapping  of  addresses  to  cache  lines  is  by  a  simple  masking  of  the  low  order  address  bits.  This 
is  the  most  simple  and  common  form,  although  other  hashing  algorithms  are  possible. 

•  An  allocate  on  write  policy  is  used,  so  data  writes  are  treated  the  same  as  reads.  This  is 
generally  the  most  pessimistic  write  policy,  as  opposed  to  its  opposite,  no  fetch  on  write,  in 
which  a  data  write  miss  is  ignored  by  the  cache  and  sent  directly  to  memory  [29].  Write  back 
versus  write  through  considerations  are  ignored,  as  the  model  does  not  consider  traffic  to  main 
memory. 

•  Set  associative  caches  use  a  least  recently  used  (LRU)  replacement  algorithm. 

Cache  performance  is  recorded  as  reference  and  miss  totals  for  each  type  of  reference.  Totals 
are  generated  separately  for  each  process  for  each  cache.  Values  are  reported  at  the  end  of  the 
simulation;  for  multiple  processes  at  the  end  of  each  process.  Process  overwrite  data  is  also  captured, 
in  the  form  of  the  total  number  of  overwrites  by  each  process  over  each  of  the  other  processes.  This 
is  accumulated  by  incrementing  a  particular  counter  identifying  the  previous  and  present  owning 
process  for  each  cache  block  overwritten.  Cache  performance  information  for  the  operating  system 
is  only  captured  during  the  execution  of  test  programs.  References  before  or  after  the  program  are 
ignored. 

One  concern  was  that  in  a  multiprocess  environment,  execution  is  non-deterministic.  Be¬ 
cause  of  this,  multiple  executions  cannot  be  used  to  evaluate  multiple  caches,  as  there  will  be 
differences  between  each  execution.  To  counter  this,  multiple  caches  with  varying  characteristics  are 
simulated  during  a  single  execution.  This  way,  cache  performance  can  be  compared  across  equivalent 
loading.  It  does  slow  down  execution,  but  accomplishes  more  with  one  run. 


31 


Another  concern  was  the  threaded  characteristics  of  the  operating  system  analysis,  some 
form  of  re-entrance  was  required.  To  address  this,  a  flag  is  set  upon  entry  to  the  ATOM  analysis 
routines.  The  flag  is  a  global  variable  visible  to  all  of  the  executing  processes,  so  can  be  used  for 
synchronization.  If  an  analysis  routine  encounters  the  flag  already  set  on  entry,  it  immediately 
exits,  maintaining  data  integrity.  By  assuming  that  the  reference  which  called  the  analysis  routine 
Wcis  in  some  way  instigated  by  another  analysis  routine,  this  also  prevents  interrupts  generated  by 
the  analysis  routine  from  contributing  to  the  simulation  reference  stream.  It  does  cause  any  other 
interrupts  which  occur  during  analysis  processing  to  be  neglected  as  well.  While  this  may  seem  like  a 
disadvantage,  such  real-time  interrupts  are  normally  skewed  by  the  slowed  processing,  so  neglecting 
a  portion  of  them  is  actually  beneficial.  This  implementation  is  not  ideal,  because  the  flag  is  not  set 
or  cleared  as  an  atomic  operation.  The  majority  of  signaling  and  synchronization  protocols  available 
in  programming  are  actually  services  provided  by  the  kernel,  and  therefore  not  available  to  code  that 
is  executing  within  the  kernel.  If  an  interrupt  occurs  in  the  process  of  checking  or  setting  the  flag, 
the  execution  is  undetermined.  This  was  particularly  a  problem  during  context  switches,  so  another 
mechanism  was  added.  Not  only  do  the  analysis  routines  check  the  signaling  flag,  but  they  also  check 
to  see  if  a  context  switch  has  occurred.  If  a  context  switch  has  occurred,  the  flag  is  automatically 
reset.  This  is  obviously  a  very  improvised  strategy  and  has  much  room  for  improvement,  but  it  was 
eflFective  in  regulating  the  reference  stream  enough  to  allow  reasonably  accurate  simulations. 

Other  aspects  of  the  code  were  dictated  by  the  use  of  ATOM.  As  mentioned  in  the  previous 
section,  all  processing  was  kept  to  a  minimum.  Loops  were  used  sparingly,  and  no  function  calls 
beyond  the  original  analysis  routine  were  used.  This  is  not  particularly  good  software  engineering 
practice,  but  necessary.  The  hardclock  scaling  mentioned  was  also  incorporated,  with  a  90%  reduc¬ 
tion  in  the  number  of  hardclock  calls.  Even  with  these  measures,  the  instrumented  operating  system 
was  slow  enough  that  it  was  also  necessary  to  perform  all  simulations  in  single  user  mode.  Multiple 
processes  could  still  be  used  by  executing  them  in  background  mode. 

The  program  developed  is  a  very  comprehensive  and  flexible  simulator  with  a  great  deal  of 
potential,  but  it  does  have  some  problems  discovered  in  hindsight  that  should  be  addressed  in  future 
work. 


•  Program  size  is  still  a  concern;  more  memory  is  definitely  needed  to  reduce  paging  for  more 
accurate  simulations.  Increasing  memory  should  also  improve  execution  times. 


32 


•  Program  speed  is  also  still  a  concern.  Ideally,  the  scheduler  should  have  been  modified  so 
that  instrumented  programs  use  a  longer  maximum  execution  interval  to  accommodate  their 
decreased  speed  as  done  in  [8], 

•  The  block  replacement  data  showing  process  overwrites  is  not  distinguished  by  reference  types. 
This  is  an  oversight  and  limits  the  potential  usefulness  of  the  data,  as  it  is  impossible  to 
determine  the  contribution  of  each  type  of  reference  to  the  amount  of  interference. 

•  Using  virtual  addressing  is  simplistic  and  raises  other  issues.  Physical  based  addressing  should 
be  used  if  possible. 

•  The  impact  of  the  existing  memory  system  and  architecture  are  not  considered,  simply  assumed 
to  be  consistent  and  neglected. 

•  The  methods  used  to  correct  timing  problems,  such  as  scaling  hardclock  interrupts  and  ignoring 
interrupts  during  analysis,  are  not  verified.  An  extensive  analysis  should  be  conducted  to 
demonstrate  or  refute  their  effectiveness. 

•  The  synchronization  used  is  very  fragile.  Ideally  the  synchronization  method  used  within  the 
kernel  should  be  studied  and  incorporated  so  that  the  analysis  code  is  truly  re-entrant.  This 
is  particularly  necessary  for  more  reliable  analysis  of  threaded  programs. 

Even  with  these  potential  problem  areas,  however,  the  program  was  capable  of  performing  most 
of  the  desired  simulations,  and  provided  an  adequate  validation  of  the  multi-process  capability  of 
ATOM. 

4.2  Verification 

To  have  any  confidence  in  the  results  of  a  simulation,  the  simulator  must  first  be  verified 
to  ensure  that  it  does  indeed  produce  accurate  results.  The  developmental  nature  of  this  project 
precluded  a  direct  comparison  with  other  equivalent  work.  Default  tools  are  provided  with  ATOM 
which  can  incorporate  the  operating  system,  but  do  not  have  the  flexibility  to  verify  the  range  of 
cache  types  that  will  be  simulated.  Other  tools  are  not  readily  available  to  generate  comparable 
simulations.  Instead,  a  multi  step  approach  was  used  to  demonstrate  the  program’s  correctness. 

The  first  concern  Wcis  the  ability  of  the  program  to  accurately  capture  the  address  traces. 
This  was  accomplished  by  writing  a  second  ATOM  based  application  that  simply  captured  traces 


33 


without  performing  any  other  processing.  The  references  it  captured  were  compared  to  those  cap¬ 
tured  by  the  simulator,  which  were  identical.  The  second  ATOM  tool  was  simple  enough  that  it 
could  be  verified  by  inspection,  so  if  it  does  not  capture  the  address  traces  correctly  then  any  flaw 
is  within  the  ATOM  framework  and  cannot  be  addressed  here. 

The  next  aspect  to  be  verified  was  the  processing  of  the  reference  stream.  The  program  was 
slightly  modified  so  that  as  each  reference  was  processed,  it  was  also  stored  to  file.  A  trace  file  was 
generated  for  the  following  four  benchmarks: 

•  Compress 

•  Ear 

•  Espresso 

•  SC 

for  the  three  caches  shown: 

•  Unified  8192  byte  2  way  associative  cache  with  64  byte  blocks 

•  Split  2048  byte  fully  associative  caches  with  32  byte  blocks 

•  Split  4096  byte  direct  mapped  caches  with  32  byte  blocks 

The  trace  file  was  then  used  as  input  to  the  DineroIII  cache  simulator  to  test  the  cache  processing. 
DineroIII  and  simulation  results  were  identical  for  all  12  cases. 

A  further  test  was  used  to  ensure  the  simulation  program  executed  correctly.  The  results  of 
single  process  simulations  were  compared  to  the  results  of  benchmark  cache  analysis  in  other  papers 
[25,  45],  The  cache  performance  was  roughly  the  same  in  that  the  same  general  behavior  patterns 
were  present,  however  there  were  some  differences.  This  is  primarily  due  to  differences  in  the  inputs 
used;  in  some  cases  alternate  or  combinations  of  inputs  different  than  those  used  here  were  simulated 
by  the  previous  research.  Their  results  were  also  generated  from  optimized  code  which  disregarded 
shared  library  references.  For  our  tests,  code  was  not  optimized  and  all  references  are  captured,  so 
the  difference  is  to  be  expected. 

The  final  concern  regarding  the  simulator  was  its  repeatability.  Given  the  threaded  environ¬ 
ment,  results  could  vary  within  a  single  execution.  Given  the  non-deterministic  environment,  results 
could  also  vary  over  multiple  executions  so  an  experiment  was  conducted  to  determine  the  extent  of 


34 


the  possible  variation.  The  same  three  caches  mentioned  above  were  simulated  for  Compress,  Ear, 
and  Espresso  5  times  each  in  succession.  Each  simulation  modeled  ten  identical  caches.  The  first 
results  showed  that  not  only  did  performance  vary,  but  so  did  the  reference  load.  Each  successive 
execution  of  the  same  program  after  the  initial  execution  had  a  reduced  number  of  references  from 
the  kernel.  Upon  reflection,  we  realized  that  this  was  due  to  the  overhead  required  for  the  first  exe¬ 
cution  of  loading  the  program  into  memory.  All  following  executions  had  reduced  operating  system 
overhead  since  the  test  program  was  already  in  memory,  as  can  be  seen  in  Figure  2. 


5100000 


.Tri^  1 
.Trial  2 


Figure  2:  Operating  System  Instruction  Fetches  Over  Repeated  Program  Execution 

To  eliminate  this  factor,  the  tests  were  repeated  without  having  each  program  executed 
sequentially.  The  variation  was  reduced,  but  not  eliminated.  For  complete  accuracy,  the  system 
was  rebooted  between  all  later  simulations.  The  second  set  of  results  highlighted  another  problem. 
In  the  output  file,  the  operating  system  references  varied  even  through  the  process  of  recording  the 
results  to  file.  Figure  3  shows  the  number  of  kernel  instruction  references  for  ten  identical  caches 
from  the  same  simulation.  The  increasing  number  of  references  for  the  later  caches  suggests  the 
point  made  in  the  previous  section,  that  in  the  operating  system  environment,  ATOM  does  not 
correctly  distinguish  between  calls  to  common  code  made  from  the  test  and  analysis  sections  of  the 
program. 

The  variation  within  a  single  simulation  was  also  due  to  the  threaded  nature  of  the  analysis, 
so  the  pseudo  re-entrance  measures  discussed  above  were  then  incorporated  into  the  program.  They 
eliminated  the  majority  of  the  operating  system  references  generated  by  the  simulation  routines,  as 
well  as  prevented  most  of  the  data  thrashing.  The  simulations  were  again  repeated,  although  only 
for  the  Espresso  benchmark  and  only  for  2  split  caches,  fully  associative  and  direct  mapped.  These 


35 


4450000 
4400000 
4350000 

4300000 

CO 
« 

c  4250000 

s 
£ 

C  4200000 
4150000 
4100000 

Figure  3:  Operating  System  Instruction  Fetches  Within  Same  Program  Execution 

results  showed  no  variation  at  all  within  a  single  execution,  and  only  a  minor  variation  of  .01  to  .1 
in  the  cache  miss  rates  between  different  executions.  Prior  to  these  measures  being  taken,  the  worst 
variation  was  substantially  less  than  was  expected,  however  using  a  single  user  mode  for  execution 
limits  the  number  of  extraneous  processes  and  greatly  reduces  the  non-determinism  of  execution. 
With  the  additional  precautions,  we  are  confident  in  the  accuracy  of  the  simulation  results. 

4.3  Simulations 

4.3.1  Platform  Information 

The  described  tests  were  performed  on  a  DEC  Alpha  3000  model  300,  a  RISC  based  AXP 
architecture.  The  root  partition  had  to  be  expanded  to  85MB  to  accommodate  the  larger  kernels 
used,  which  could  contain  up  to  a  48MB  test  kernel  in  addition  to  the  normal  root  residents.  The 
swap  space  was  originally  195MB  which  proved  to  be  insufficient  to  instrument  large  programs.  A 
second  local  disk  was  added  increasing  the  swap  space  to  323MB.  The  usr  partition  was  694MB  which 
was  generally  adequate  although  more  space  was  useful  at  some  points.  The  added  disk  included 
a  1090MB  scratch  directory  which  proved  to  be  invaluable  in  storing  results,  traces,  kernels,  and 
other  files.  The  critical  factor  was  memory.  The  system  only  had  64MB  of  main  memory,  so  during 
simulations  only  about  15MB  of  memory  was  available  for  test  programs.  For  future  efforts,  the 
memory  must  be  increased  to  improve  simulation  performance  and  accuracy. 

The  operating  system  used  was  DEC  OSF/1  version  3.2A  Unix  kernel.  Newer  versions  are 
available  however  this  version  was  sufficient  for  these  tests.  The  ATOM  tool  used  was  version  2.20. 
It  is  also  being  continuously  updated;  research  was  begun  with  version  2.13,  although  the  system  was 


36 


upgraded  to  version  2.20  before  simulations  were  performed.  Each  new  version  of  ATOM  usually 
addresses  shortcomings  of  past  versions,  particularly  in  terms  of  intrusiveness,  and  refines  the  newer 
capabilities,  such  as  instrumenting  the  kernel,  so  the  most  current  version  available  should  be  used 
for  future  work.  The  test  programs  used  are  from  the  SPEC  92  benchmark  suite.  These  programs 
tend  to  focus  on  technical,  as  opposed  to  commercial,  applications.  They  are  more  computation 
intensive  than  other  potential  test  programs,  but  are  also  readily  available  and  a  standard  test  tool. 

4.3.2  Test  Parameters 

Simulations  were  performed  capturing  cache  miss  rates  for  program  execution  alone,  pro¬ 
grams  with  the  operating  system,  and  multiple  programs  executed  concurrently.  The  four  bench¬ 
marks  used  for  these  simulations  were  [74]: 

Compress  The  compress  benchmark  is  the  same  program  as  the  Unix  compress  utility.  It  is  a 
CPU  intensive  integer  benchmark  which  compresses  an  input  file  using  the  Lempel-Ziv  data 
compression  algorithm.  It  has  a  greater  10  content  than  the  other  benchmarks,  so  is  more 
sensitive  to  the  system  and  execution  environment.  Due  to  its  nature,  the  program  has  a 
repetitive  instruction  reference  stream  with  a  drastically  less  localized  data  reference  stream. 
A  1MB  input  file  in  was  used  with  the  following  command  line: 

# compress  -f  '~c  in  >  /dev/null 

which  causes  the  utility  to  route  the  compressed  data  to  stdout  instead  of  back  to  the  original 
file,  where  it  is  discarded.  This  was  done  so  that  the  execution  of  the  benchmark  did  not  affect 
the  input  program,  which  was  useful  during  repeated  executions.  As  part  of  the  benchmark 
suite,  the  test  calls  for  multiple  iterations  of  compress,  but  for  our  tests  only  a  single  execution 
is  performed  to  reduce  simulation  time.  The  goal  of  this  research  is  not  to  benchmark  the 
system  used,  so  the  full  tests  were  not  required. 

GCC  GCC  is  the  GNU  C  compiler,  and  is  the  most  complex  benchmark  used.  As  a  compiler,  the 
parsing,  organization,  and  optimization  performed  produce  a  highly  irregular  reference  stream. 
Some  10  is  performed,  as  well  cis  a  variety  of  other  system  calls,  and  the  execution  depends 
heavily  on  the  system  used.  The  compiler  was  executed  by: 

#gcc  -0  -quiet  stmt.i  -o  stmt 


37 


which  caused  it  to  optimize  the  source  code  and  suppress  any  output.  Again,  the  benchmark 
suite  called  for  compilation  of  multiple  programs,  however  only  the  single  input  stmt .  i  was 
used  for  simplicity.  One  note  regarding  the  instrumentation  of  gcc,  it  does  require  certain 
ATOM  flags  the  other  three  benchmarks  do  not.  The  ATOM  command  line  to  be  used  with 
gcc  is: 

y.atom  gcc.rr  -tool  userl  -heapbase  50000  -32addr 

These  are  required  for  ATOM  to  correctly  instrument  gcc,  as  the  compiler  uses  a  wider  range 
of  the  address  space  and  a  larger  heap  segment  of  memory. 

Espresso  Espresso  is  a  tool  for  generating  and  optimizating  Programmable  Logic  Arrays.  Its 
primary  task  is  minimizing  Boolean  functions,  so  also  has  a  repetitive  instruction  stream  with 
a  more  localized  data  stream  than  compress.  It  uses  very  few  operating  system  services,  and 
is  a  small  program  (before  tracing),  so  normally  requires  little  paging.  The  benchmark  wcis 
used  with  the  tial.in  input  file  with  suppressed  output  as  shown  below: 

#espresso  tial.in  >  /dev/null 

As  the  other  programs,  the  actual  benchmark  entails  multiple  input  files,  but  only  this  one 
was  used  for  testing. 

Alvinn  Alvinn  stands  for  Autonomous  Land  Vehicle  in  a  Neural  Network,  and  represents  a  neural 
network  control  system  capable  of  taking  data  from  a  video  camera  and  laser  range  finder  and 
generating  control  data  for  an  automated  vehicle.  The  benchmark  is  a  single  precision  floating 
point  program  which  trains  the  network  through  backpropagation  over  200  input  epochs.  It 
performs  minimal  lO,  although  does  use  the  floating  point  unit  extensively.  It  is  repetitive, 
although  with  a  much  more  complex  structure  than  Compress.  The  command  line  used  was 
simply: 

#backprop  >  /dev/null 

which  activates  the  training  model  with  the  input  files  h_o_w.txt,  i_h_w.txt,  in_pats.txt, 
and  out_pats.txt  residing  in  the  test  directory.  The  results  of  the  training  for  each  epoch 
are  the  only  output,  which  is  discarded. 


38 


Each  simulation  was  performed  as  described  in  the  previous  sections  using  an  input  file 
of  40  caches  of  various  configurations.  Table  1  assigns  a  number  to  each  cache  which  is  used  for 
later  identification,  and  shows  the  different  characteristics  of  each.  Only  lower  associativities  are 
used  to  minimize  the  amount  of  looping  in  processing.  Other  characteristics  are  arbitrary  selections 
over  a  general  range,  with  a  limit  of  512  lines  per  cache  to  minimize  storage.  The  results  of  these 
simulations  are  discussed  in  the  next  section. 


39 


1 

1  Unified  or  Instruction 

1  Data 

ID 

Type 

Cache  Size 

Block  Size 

Assoc 

Cache  Size 

Block  Size 

Assoc 

0 

0 

8,192 

64 

2 

NA 

NA 

NA 

1 

0 

16,384 

64 

2 

NA 

NA 

NA 

2 

0 

32,768 

64 

2 

NA 

NA 

NA 

3 

0 

65,536 

64 

2 

NA 

NA 

NA 

4 

1 

4,096 

32 

1 

4,096 

32 

1 

5 

1 

4,096 

32 

2 

4,096 

32 

2 

6 

1 

4,096 

32 

4 

4,096 

32 

4 

7 

1 

4,096 

64 

1 

4,096 

64 

1 

8 

1 

4,096 

64 

2 

4,096 

64 

2 

9 

1 

4,096 

64 

4 

4,096 

64 

4 

10 

1 

4,096 

128 

1 

4,096 

128 

1 

11 

1 

4,096 

128 

2 

4,096 

128 

2 

12 

1 

4,096 

128 

4 

4,096 

128 

4 

13 

1 

8,192 

32 

1 

8,192 

32 

1 

14 

1 

8,192 

32 

2 

8,192 

32 

2 

15 

1 

8,192 

32 

4 

8,192 

32 

4 

16 

1 

8,192 

64 

1 

8,192 

64 

1 

17 

1 

8,192 

64 

2 

8,192 

64 

2 

18 

1 

8,192 

64 

4 

8,192 

64 

4 

19 

1 

8,192 

128 

1 

8,192 

128 

1 

20 

1 

8,192 

128 

2 

8,192 

128 

2 

21 

1 

8,192 

128 

4 

8,192 

128 

4 

22 

1 

16,384 

32 

1 

16,384 

32 

1 

23 

'  1 

16,384 

32 

2 

16,384 

32 

2 

24 

1 

16,384 

32 

4 

16,384 

^  32 

4 

25 

1 

16,384 

64 

1 

16,384 

64 

1 

26 

1 

16,384 

64 

2 

16,384 

64 

2 

27 

1 

16,384 

64 

4 

16,384 

64 

4 

28 

1 

16,384 

128 

1 

16,384 

128 

1 

29 

1 

16,384 

128 

2 

16,384 

128 

2 

30 

1 

16,384 

128 

4 

16,384 

128 

4 

31 

1 

32,768 

64 

1 

32,768 

64 

1 

32 

1 

32,768 

64 

2 

32,768 

64 

2 

33 

1 

32,768 

64 

4 

32,768 

64 

4 

34 

1 

32,768 

128 

1 

32,768 

128 

1 

35 

1 

32,768 

128 

2 

32,768 

128 

2 

36 

1 

32,768 

128 

4 

32,768 

128 

4 

37 

1 

32,768 

256 

1 

32,768 

256 

1 

38 

1 

32,768 

256 

2 

32,768 

256 

2 

39 

1 

32,768 

256 

4 

32,768 

256 

4 

Table  1:  Simulated  Cache  Parameters 


40 


5  Simulation  Results 


Simulations  of  caches  with  varying  types,  cache  sizes,  associativities,  and  block  sizes  as 
described  in  Table  1,  were  performed  with  the  4  benchmarks.  The  data  generated  by  the  simulations 
has  been  analyzed  by  focusing  on  various  aspects  of  the  cache  behavior.  These  are  the  change  in 
cache  workload,  the  change  in  cache  performance  for  a  specific  process,  the  interference  generated 
between  the  processes,  and  the  net  change  in  cache  performance  over  all  processes.  Other  areas  of 
possible  exploration  include  studying  performance  differences  between  data  reads  and  writes,  and  a 
detailed  characterization  of  the  operating  system  performance.  In  some  instances  only  a  portion  of 
the  available  data  is  shown  in  figures.  Tables  of  all  results  are  provided  in  appendix  B. 

5.1  Cache  Workload 

Before  looking  at  the  cache  performance,  it  is  important  to  understand  how  introducing 
the  operating  system  and  additional  processes  affect  the  memory  reference  stream.  The  first  set 
of  simulations  establish  a  baseline  by  recording  the  cache’s  performance  for  each  benchmark  alone. 
The  frequency  of  each  type  of  reference  is  presented  in  Table  2. 


Benchmark 

Instruction  Fetches 

Data  Reads 

Data  Writes 

Total  Data 

Tot2d  References 

Compress 

87,045,943 

22,412,017 

8,521,660 

30,933,677 

117,979,620 

GCC 

160,240,141 

69,272,173 

229,512,314 

Espresso 

977,787,923 

225,779,346 

59,867,420 

285,646,766 

1,263,434,689 

Alviim 

5,233,222,111  1 

1,415,013,652 

487,428,474 

Table  2:  Benchmark  References 


The  second  set  of  simulations  used  the  same  benchmarks,  but  included  the  operating  system. 
The  frequency  of  each  type  of  reference  is  shown  in  Table  3  for  each  process.  There  is  some  variation 
in  the  number  of  references  for  each  benchmark  due  to  execution  differences,  but  it  is  minimal.  Hello 
World  was  used  for  some  of  the  basic  program  testing,  and  is  included  as  a  curiosity.  For  the  other 
benchmarks,  the  operating  system  overhead  was  generally  small,  less  than  15%  of  the  total  number 
of  references.  For  a  small  program  such  as  Hello  World,  however,  the  operating  system  overhead 
becomes  the  dominant  source  of  memory  references,  totally  overshadowing  the  program. 

The  amount  of  overhead  introduced  by  the  operating  system  is  smaller  than  expected.  This 
is  because  the  tests  were  performed  in  single  user  mode,  and  a  majority  of  the  operating  system 
routines  were  not  being  executed.  In  this  context,  processes  such  as  network  and  printer  controllers. 


41 


and  the  variety  of  other  background  system  processes  are  considered  to  be  part  of  the  ‘operating 
system’.  One  test  using  ps  in  multi-user  mode  showed  over  40  different  processes  being  executed, 
only  one  of  which  was  actually  a  user  program.  For  these  system  processes  to  be  included,  they 
must  also  be  instrumented.  During  the  simulations  performed,  the  operating  system  references  are 
generally  just  the  overhead  required  by  the  test  programs. 


Benchmark 

Instruction  Fetches 

Data  Reads 

Data  Writes 

Total  Data 

Total  References 

HeUo  World 

1,247 

207 

135 

342 

1,589 

OS 

337491 

84,403 

51,332 

135,735 

473,226 

Total 

338,738 

84,610 

51,467 

136,077 

474,815 

Compress 

87,045,969 

22,412,010 

8,521,661 

30,933,671 

117,979,640 

OS 

5,567,602 

1,518,924 

802,242 

2,321,166 

7,888,768 

Total 

92,613,571 

23,930,934 

9,323,903 

33,254,837 

125,868,408 

GCC 

160,240,175 

50,197,333 

19,074,845 

69,272,178 

229,512,353 

OS 

18,705,569 

5,130,601 

2,613,506 

7,744,107 

26,449,676 

Total 

178,945,744 

55,327,934 

21,688,351 

77,016,285 

255,962,029 

Espresso 

977,787,899 

225,779,331 

59,867,421 

285,646,752 

1,263,434,651 

OS 

29,093,428 

9,107,479 

3,585,537 

12,693,016 

41,786,444 

Total 

1,006,881,327 

234,886,810 

63,452,958 

298,339,768 

1,305,221,095 

Alvinn 

5,233,222,045 

1,415,013,630 

487,428,474 

1,902,442,104 

7,135,664,149 

OS 

197,365,478 

60,413,211 

25,986,851 

86,400,062 

283,765,540 

Total 

5,430,587,523 

1,475,426,841 

513,415,325 

1,988,842,166 

7,419,429,689 

Table  3:  Benchmark  with  Operating  System  References 


The  operating  system  overhead  will  vary  depending  on  the  nature  of  the  program,  but  for 
these  benchmarks  it  remains  fairly  consistent.  The  percent  of  the  total  references  which  are  generated 
by  the  kernel  is  shown  in  Figure  4,  which  ranges  between  2.89  to  12.05  percent.  This  can  also  be 
viewed  as  the  percent  increase  in  number  of  references  as  seen  in  Figure  5,  which  has  a  similar 
range.  For  the  benchmarks  used,  the  program  references  still  dominate.  The  benchmarks  which 
require  minimal  resources  and  I/O  (Espresso  and  Alvinn)  are  the  least  affected  by  the  addition  of 
the  operating  system.  Compress  is  also  fairly  simple,  but  requires  a  larger  amount  of  I/O,  hence  its 
greater  overhead.  A  complex  program  such  as  the  GCC  compiler  is  affected  the  most.  The  amount 
of  overhead  found  in  these  results  is  less  than  that  found  in  past  studies  [1,  2].  Agarwal  found  the 
operating  system  could  increase  the  number  of  instructions  by  5-75%,  but  this  is  also  for  an  older, 
CISC,  architecture.  Both  studies  did  show  that  complex  programs,  such  as  compilers,  are  the  most 
affected. 

Figure  6  shows  the  relative  distribution  of  each  reference  type  within  the  workload  for  both 
the  program  and  its  operating  system  overhead.  Both  the  program  and  operating  system  references 
have  about  the  same  distribution,  with  roughly  70%  instruction  fetches.  This  is  consistent  with 


42 


%  Increase  In  References  %  References  From  Kernel 


Figure  6:  Distribution  of  Reference  Types 


5  data  writes 
■  data  reads 
in  instruction  fetches 


[8].  The  small  proportion  of  data  writes  explains  the  seemingly  larger  change  seen  in  the  previous 
two  figures  —  there  are  relatively  fewer  data  writes  so  a  smaller  change  generates  a  larger  percent 
difference. 

The  final  set  of  simulations  was  performed  executing  two  benchmarks  concurrently,  captur¬ 
ing  references  from  each  and  the  operating  system.  Results  were  logged  after  each  test  program 
completed.  The  first  report  contains  the  information  of  interest,  the  cache  performance  with  two 
competing  user  programs.  The  second  report  includes  the  period  of  time  after  the  first  process  had 
completed,  so  only  a  single  user  process  was  executing  during  part  of  its  tracing  period.  Since  this 
analysis  focuses  on  the  effects  of  multiple  processes,  the  second  report  has  been  discarded.  For  this 
reason,  the  data  shown  in  Table  4  omits  a  portion  of  the  execution  of  the  longer  process  in  each 
case.  Any  future  references  to  these  simulations  also  refer  specifically  to  the  cache  performance  at 
the  end  of  the  first  program. 

One  fact  that  is  not  visible  from  this  table  is  that  when  both  programs  have  completed,  the 
cumulative  operating  system  overhead  (measured  in  number  of  references)  is  greater  than  the  sum  of 
the  overhead  for  each  program  individually,  as  shown  in  Table  5.  If  the  number  of  operating  system 
references  generated  when  the  benchmarks  axe  executed  separately  are  added  (the  first  column), 
this  value  is  less  than  the  number  of  operating  system  references  generated  when  the  same  two 
benchmarks  are  executed  concurrently  (the  second  column).  This  highlights  the  increased  operating 
system  activity  required  to  switch  between  multiple  processes,  roughly  a  20-40%  increase. 


44 


Benchmarks 

Instruction  Fetches 

Data  Reads 

Data  Writes 

Total  Data 

Total  References 

Compress 

GCC 

OS 

Total 

87,045,885 

68,021,687 

28,102,411 

183,169,983 

22,411,994 

21,218,807 

7,468,658 

51,099,459 

8,521,651 

8,094,452 

4,160,003 

20,776,106 

30,933,645 

29,313,259 

11,628,661 

71,875,565 

117,979,530 

97,334,946 

39,731,072 

255,045,548 

Compress 

Espresso 

OS 

Total 

87,045,885 

99,475,944 

15,541,809 

202,063,638 

8,521,651 

4,659,787 

2,247,254 

15,428,692 

30,933,645 

28,940,609 

6,558,122 

66,432,376 

117,979,530 

128,416,553 

22,099,931 

268,496,014 

GCC 

Espresso 

OS 

Total 

160,240,175 

224,015,827 

39,004,710 

423,260,712 

50,197,333 

51,131,704 

10,758,087 

112,087,124 

mi 

69,272,178 

63,229,622 

16,350,661 

148,852,461 

229,512,353 

287,245,449 

55,355,371 

572,113,173 

Table  4:  Concurrent  Benchmarks  with  Operating  System  References 


Benchmarks 

Sum  of  Individual  Overheads 

Concurrent  Overhead 

Compress  /GCC 

34,338,444 

47,433,154 

Compress /Espresso 

49,675,212 

59,365,363 

GCC/Espresso 

68,236,120 

89,030,467 

Table  5:  System  Overhead  Comparison 


A  problem  arose  when  certain  programs  (or  combinations  of  programs)  were  traced,  gen¬ 
erating  the  trap:  invalid  memory  access  error  mentioned  previously.  It  is  somehow  related  to 
the  size  or  length  of  the  test  programs.  Benchmarks  such  as  Xlisp  (9,561,089,165  references)  and 
Ear  (17,375,158,291  references)  would  crash  the  platform  if  simulated  with  the  operating  system. 
Similarly,  executing  any  of  the  three  smaller  benchmarks  concurrently  with  Alvinn  would  crash  the 
system,  as  well  as  any  three  programs  in  combination.  While  this  problem  limited  the  scope  of  the 
simulations,  correcting  it  was  beyond  the  purview  of  this  research. 

5.2  Impact  on  Process  Performance 

The  simplest  way  to  visualize  the  impact  of  the  operating  system  and  additional  processes  is 
to  measure  their  effect  on  the  cache  performance  for  a  particular  program’s  reference  stream.  Figures 
7  through  14  show  the  cache  miss  rates  for  benchmark  references  only,  for  each  of  the  4  benchmarks. 
The  baseline  is  the  result  from  the  single  process  cache  simulation.  The  other  sets  of  results  are 
essentially  the  same  reference  stream  but  with  transient  misses.  Any  performance  changes  are  due 
strictly  to  these  transient  effects. 

The  single  process  results  exhibit  normal  cache  behavior.  As  expected,  increasing  cache 
size  decreases  miss  rate.  A  larger  cache  can  contain  more,  if  not  all,  of  a  programs  working  set. 


45 


thus  reducing  capacity  misses.  Also,  a  larger  cache  will  have  fewer  locations  assigned  to  each  line, 
potentially  reducing  conflict  misses.  Increasing  associativity  also  decreases  miss  rates,  although  with 
diminishing  returns;  the  improvement  from  A=2  to  A=:4  is  less  than  the  improvement  from  A=1  to 
A=2.  Associativity  can  reduce  conflict  misses  by  allowing  a  line  to  maintain  more  than  one  block  at 
a  time,  but  the  benefits  are  limited  by  the  number  of  references  to  any  one  line.  Since  the  caches  use 
a  constant  area,  increasing  the  associativity  decreases  the  number  of  possible  indices,  thus  increasing 
the  stress  on  a  single  index.  For  this  reason,  in  some  instances  increasing  associativity  can  increase 
the  miss  rate  (e.g.  Alvinn).  Increasing  the  block  size  increases  the  amount  of  memory  fetched  on 
each  miss.  This  is  generally  beneficial  for  instruction  references  which  exhibit  spatial  locality,  but 
the  reverse  may  be  true  for  data  references.  Depending  on  the  benchmark,  data  miss  rates  can  either 
increase  (e.g.  Compress)  or  decrease  (e.g.  Espresso)  as  block  size  increases,  but  this  trend  is  also 
related  to  associativity  and  other  factors.  Increasing  block  size  also  decreases  the  number  of  cache 
indices,  so  again  the  load  on  each  line  is  increased  potentially  negating  any  benefits.  These  results 
are  comparable  to  those  found  in  [25,  45,  56]. 

Comparing  the  single  process  results  with  the  other  simulations,  these  trends  are  not  gen¬ 
erally  affected.  In  most  Ccises,  the  results  follow  the  same  patterns  but  with  a  noticeable  increase 
in  cache  miss  rates.  The  amount  of  increase  may  vary  by  cache  or  remain  relatively  constant,  de¬ 
pending  on  the  characteristics  of  the  particular  benchmark  being  considered.  This  increase  is  the 
error  in  assuming  that  cache  behavior  can  be  defined  by  a  single  process  simulation,  and  shows  the 
difference  between  a  single  program’s  cache  performance  when  it  is  considered  alone  versus  when 
it  is  considered  in  a  multiprocess  simulation.  As  can  be  seen,  the  impact  of  the  operating  sys¬ 
tem  is  much  smaller  than  that  of  an  additional  process.  This  is  logical,  considering  the  operating 
system  normally  executes  for  shorter  durations  as  it  services  system  calls  and  interrupts.  The  im¬ 
pact  of  additional  processes  is  generally  most  pronounced  in  those  caches  that  already  exhibit  poor 
performance,  although  this  does  depend  on  the  benchmark. 

It  is  also  interesting  to  consider  the  distribution  of  misses.  Figures  7  through  13  show 
the  percent  of  misses  that  were  from  instruction  references.  It  is  interesting  to  note  that  although 
instructions  make  up  the  majority  of  references,  they  are  usually  in  the  minority  of  misses  —  as 
expected  due  to  their  increased  locality.  For  programs  such  as  Compress  or  Alvinn  with  a  great  deal 
of  spatial  locality  in  their  instructions  but  not  data,  the  loss  of  locality  due  to  transient  interference 
is  visible  in  the  increased  proportion  of  instruction  misses  found  in  the  simulations  which  included 


46 


the  operating  system  and  additional  processes.  Other  programs  such  as  Espresso  may  be  affected 
either  way,  although  data  misses  still  predominate.  A  more  complex  program  such  as  GCC  has 
much  less  locality  in  its  reference  stream,  as  can  be  seen  by  the  fact  that  instructions  account  for 
as  much  as  65%  of  its  misses.  Hence  when  the  additional  processes  are  considered,  it  is  possible  for 
data  cache  hit  rates  to  be  affected  more  and  the  ratio  to  go  down. 

5.3  Process  Interference 

Another  way  to  visualize  the  impact  of  the  additional  references  is  to  analyze  the  proportion 
of  intrinsic  versus  extrinsic  interference  seen  by  the  various  test  programs.  The  percentage  of  misses 
attributed  to  intrinsic  interference  can  be  approximated  by  the  percent  of  misses  where  the  reference 
overwrote  a  block  containing  information  from  the  same  program.  The  alternative  is  for  the  reference 
to  miss  and  overwrite  another  program’s  data,  highlighting  extrinsic  interference.  A  certain  number 
of  references  will  miss  and  overwrite  invalid  data  at  start  up,  but  these  are  finite  (based  on  cache 
size),  and  will  not  significantly  affect  the  percentage.  The  self  overwrite  percentage  is  shown  for 
each  cache  for  the  4  benchmarks  in  Figures  19  through  22.  When  a  block  is  overwritten  no  test 
is  performed  to  see  if  the  evicted  data  is  live,  nor  is  there  a  check  of  the  new  data  to  determine 
if  it  has  been  accessed  before,  so  these  figures  are  not  exactly  intrinsic  interference,  but  should  be 
comparable. 

The  most  basic  simulation  with  a  single  benchmark  as  input  will  have  100%  of  its  misses 
due  to  internal  considerations,  by  definition.  When  the  operating  system  is  added,  roughly  10-20% 
of  the  misses  are  external  overwrites,  due  to  the  impact  of  the  OS  references.  Adding  an  additional 
process  to  the  simulation  increases  the  external  impact  to  40-70%,  depending  on  the  cache  and 
particular  program.  It  is  unfortunate  that  it  was  not  possible  to  perform  simulations  with  a  greater 
multitasking  level  so  that  a  trend  might  be  visible. 

Smaller  caches  are  affected  more  by  extrinsic  interference  as  expected,  as  are  caches  with 
lower  associativities.  As  each  process  is  executed,  its  references  are  loaded  into  the  cache.  A  smaller 
cache  may  be  totally  overwritten  by  the  new  data,  while  a  larger  cache  may  be  able  to  retain 
a  portion  of  the  previous  program’s  working  set.  Program  characteristics  such  as  the  amount  of 
system  overhead,  as  well  as  working  set  size  and  fluctuation,  affect  the  amount  of  interference,  but 
are  more  difficult  to  quantify  without  an  extensive  trace  analysis. 


47 


Miss  Rale  (%)  Miss  Rate  (%)  Miss  Rate  (%) 


Instruction  References,  A=1 


Instruction  References,  A=2 


1.2 


-O 


- A— 

-A- 
—  -A  - 


-  -  O  - 

-  43- 

—  43  - 

- •— 

-  -  O-  - 

—  o- 

—  -o  - 


.  atone  {S=4096) 

,  w/  OS  (S=4096) 
w/  OS  &  GCC  (S=4096) 
w/  OS  and  Espresso  (S=4096) 

,  alone  (S=8192) 

,  w/OS  {S=8192) 
w/OS&GCC  (S=8192) 
w/  OS  and  Espresso  (S=8192) 

,  alone  (S=16384) 
w/OS  {S= 16384) 
w/OS&GCC  (S=16384) 
w/  OS  and  Espresso  (S=: 16384) 
alone  (S=32768) 
w/OS  (S=32768) 
w/  OS  &  GCC  (S=32768) 
w/  OS  and  Espresso  (S=32768) 


.  alone  (S=4096) 


-  -  ^  .  w/ OS  (S=4096) 

_  .  w/OS&GCC(S=4096) 

^  w/  OS  and  Espresso  (S=4096) 

^  alone  (S=81 92) 

-  .  ^  -  .w/OS(S=8192) 

_  ,  w/OS&GCC(S=8192) 

^  w/  OS  and  Espresso  (S=8192) 

U  alone  (S=16384) 

-  -  Q.  -  .  w/OS  <S= 16384) 

-Q—  -  w/OS&GCC(S=16384) 

^  ^  w/  OS  and  Espresso  (S=1 6384) 

^  alone  (5=32768) 


.  -  .Q-  -  -  w/ OS  (S=32768) 

—  .Q—  .  w/ OS  &  GCC  (S=32768) 

w/  OS  and  Espresso  (S=32768) 


—  -o  - 

- A- 

—  -A“ 

—  -A  - 


—  -  O  - 

—  43- 

—  43  - 


.  .  O  - 


—  o 


,  alone  (S=4096) 

.  w/  OS  (S=4096) 
w/  OS  &  GCC  (S=4096) 
w/  OS  and  Espresso  (S=4096) 

,  alone  (S=8192) 
,w/OS(S=8192) 
w/  OS  &  GCC  (S=8192) 
w/  OS  and  Espresso  (S=8192) 
alone  (S=16384) 
w/OS(S=16384) 
w/  OS  &  GCC  (S=16384) 
w/  OS  and  Espresso  (S=16384) 
alone  (S=32768) 
w/  OS  (S=32768) 
w/  OS  &  GCC  (S=32768) 
w/  OS  and  Espresso  (S=32768) 


Figure  7:  Process  Instruction  Reference  Miss  Rates  For  Compress 


48 


Miss  Rale  (%)  Miss  Rate  (%)  Miss  Rate  (%) 


Data  References,  A=1 


Data  References,  A=2 


Data  References,  A=4 


Figure  8:  Process  Data  Reference  Miss  Rates  For  Compress 


49 


Miss  Rale  (%)  Miss  Rate  {%)  Miss  Rale  {%) 


Instruction  References,  A=1 


Instruction  References,  A=2 


Instruction  References,  A=4 


Figure  9:  Process  Instruction  Reference  Miss  Rates  For  GCC 


50 


Miss  Rate  (%)  Miss  Rate  (%)  Miss  Rale  {%) 


Data  References,  A=1 


17 


aione  (S=4096) 

w/  OS  (S=4096) 

w/  OS  &  Compress  {S=4096) 

w/  OS  and  Espresso  (S=4096) 

aione  {S=8192) 

w/  OS  (5=^8192) 

w/  OS  &  Compress  (S=81 92) 

w/  OS  and  Espresso  (S=8192) 

aione  (S=16384) 

w/OS  (S=16384) 

w/  OS  &  Compress  (S=1 6384) 

w/  OS  and  Espresso  (S=16384) 

alone  (S=32768) 

w/  OS  (S=32768) 

w/  OS  &  Compress  (S=:32768) 

w/  OS  and  Espresso  (S=32768) 


alone  (S=4096) 

w/  OS  (S=4096) 

w/  OS  &  Compress  (S=4096) 

w/  OS  and  Espresso  {S=4096) 

alone  (S=8192) 

w/OS(S=8192) 

w/  OS  &  Compress  (5=81 92) 

w/  OS  and  Espresso  (S=8192) 

aione  (S= 16384) 


-  _  Q.  _  .  w/ OS  (S= 16384) 

_  .  w/OS  &  Compress  (S=1 6384) 

^  ^  w/OS  and  Espresso  (5=16384) 

0  alone  (S=32768) 

-  -  O-  -  -  w/ OS  (S=32768) 

^  _  w/  OS  &  Compress  (S=32768) 

^  .  w/  OS  and  Espresso  (S=32768) 


alone  (S=4096) 

w/  OS  (3=4096) 

w/  OS  &  Compress  (S=4096) 

w/  OS  and  Espresso  (S=4096) 

alone  (S=8192) 

w/  OS  (S=8192) 

w/  OS  &  Compress  (S=8192) 

w/  OS  and  Espresso  (S=81 92) 

alone  (S=16384) 

w/  OS  (S=16384) 

w/  OS  &  Compress  (S=1 6384) 

w/  OS  and  Espresso  (S= 16384) 

alone  (S=32768) 

w/OS  (S=32768) 


_  •  w/ OS  &  Compress  (S=32768) 

^  ^  _  w/  OS  and  Espresso  (S=32768) 


Figure  10:  Process  Data  Reference  Miss  Rates  For  GCC 


51 


Miss  Rale  (%)  Miss  Rale  (%)  Miss  Rale  {%) 


Instruction  References,  A=1 


2.5 


Instruction  References,  A=2 


Instruction  References,  A=4 


1.6 


- ^ 

-  -  o  - 


—  -o  - 

- A— 

—  -  A  - 

—  -A-* 

—  -A  - 


.  .  O  - 

-  HO- 

—  HD  - 


—  -  O  - 

—  o-  - 

—  -o  — 


—  -A—  - 

—  -A  — 

- ■ - 

—  -  o  -  • 

—  -o-  - 

—  — 


Figure  11:  Process  Instruction  Reference  Miss  Rates  For  Espresso 


_  alone  (S=4096) 

.  w/OS  (S=4096) 

,  w/  OS  &  Compress  (5=4096) 

,  w/OSandGCC{S=4096) 

_  alone  (S=8192) 

,w/OS  (S=8192) 
w/  OS  &  Compress  (S=8192) 
w/  OS  and  GCC  (S=8192) 
alone  (S=16384) 
w/  OS  (S=16384> 
w/  OS  &  Compress  (S=16384) 
w/  OS  and  GCC  (S=1 6384) 
alone  (S=32768) 
w/  OS  (3=32768) 
w/  OS  &  Compress  (S=32768) 
w/  OS  and  GCC  (S=32768) 


,  alone  (S=4096) 

.  w/  OS  (S=4096) 
w/  OS  &  Compress  (S=4096) 
w/  OS  and  GCC  (S=4096) 

,  alone  (S=8192) 

,  w/  OS  (S=8192) 
w/  OS  &  Compress  (S=8192) 
w/  OS  and  GCC  (S=8192) 

,  alone  (S=16384) 
w/  OS  (S= 16384) 
w/  OS  &  Compress  (S=16384) 
w/OS  and  GCC  (S=16384) 
alone  (S=32768) 
w/  OS  (S=32768) 
w/  OS  &  Compress  (S=32768) 
w/  OS  and  GCC  (S=32768) 


,  alone  (S=4096) 

,  w/  OS  (S=4096) 
w/  OS  &  Compress  (S=4096) 
w/  OS  and  GCC  (S=4096) 
.alone  (S=8192) 
w/  OS  (S=8192) 
w/  OS  &  Compress  (S=8192) 
w/  OS  and  GCC  (S=8192) 

.  alone  (S=16384) 
w/  OS  (S=16384) 
w/  OS  &  Compress  (S=16384) 
w/  OS  and  GCC  (S=1 6384) 
alone  (S=32768) 
w/  OS  (S=32768) 
w/  OS  &  Compress  (S=32768) 
w/  OS  and  GCC  (S=32768) 


52 


Miss  Rate  (%)  Miss  Rale  (%)  Miss  Rate  {%) 


Data  References,  A=1 


-  - 


-  -  o  -  ■ 

-  -D  - 


Figure  12:  Process  Data  Reference  Miss  Rates  For  Espresso 


alone  (5=4096) 

w/ OS  (5=4096) 

w/  OS  &  Compress  (5=4096) 

w/  OS  and  GCC  (5=4096) 

alone  (5=8192) 

w/  OS  (5=8192) 

w/  OS  &  Compress  (5=8192) 

w/  OS  and  GCC  (5=8192) 

alone  (5=16384) 

w/  OS  (5=16384) 

w/  OS  &  Compress  (5=16384) 

w/  OS  and  GCC  (5=16384) 

alone  (5=32768) 

w/ OS  (5=32768) 

w/  OS  &  Compress  (5=32768) 

w/  OS  and  GCC  (5=32768) 


alone  (5=4096) 

w/  OS  (5=4096) 

w/  OS  &  Compress  (5=4096) 

w/  OS  and  GCC  (5=4096) 

atone  (5=8192) 

w/OS  (5=8192) 

w/  OS  &  Compress  (5=8192) 

w/  OS  and  GCC  (5=8192) 

,  alone  (5=16384) 

,  w/  OS  (5=16384) 
w/  OS  &  Compress  (5=16384) 
w/  OS  and  GCC  (S=16384) 

.  alone  (5=32768) 

,  w/  OS  (S=32768) 
w/  OS  &  Compress  (5=32768) 
w/  OS  and  GCC  (5=32768) 


atone  (5=4096) 

w/  OS  (S=4096) 

w/  OS  &  Compress  (5=4096) 

w/  OS  and  GCC  (S=4096) 

alone  (5=8192) 

w/OS  (S=8192) 

w/  OS  &  Compress  (5=8192) 

w/  OS  and  GCC  (S=8192) 

,  alone  (S=16384) 
w/OS  (S=16384) 
w/  OS  &  Compress  (S=1 6384) 
w/  OS  and  GCC  (S=16384) 

,  alone  (S=32768) 

,  w/  OS  (S=32768) 
w/  OS  &  Compress  (5=32768) 
w/  OS  and  GCC  (S=32768) 


53 


Data  References,  A=1 


alone  (S=4096) 


Data  References,  A=2 


Data  References,  A=4 


,  ,  ^  .  w/ os  (S=4096) 


,  alone  {S=81 92) 


-  .  ^  .  w/OS(S=8192) 


.  alone  (S=16384) 


.  -  Q.  .  .W/0S(S=16384) 


.  alone  (S=32768) 


,  -  O-  -  -W/OS(S=32768) 


.  alone  (S=4096) 


-  -  ^  -  w/  OS  (S=4096) 


.  alone  (S=8192) 


.  .  .  .  w/OS(S=8192) 


.  alone  (8=16384) 


-  .  Q.  -  .w/OS(S=16384) 


.  alone  (S=32768) 


-  -  O-  -  -w/ OS  (3=32768) 


.  alone  (S=4096) 


were  Instructions 


%  Misses  Self  Overwritten  %  Misses  Self  Overwritten  %  Misses  Self  Overwritten  %  Misses  Self  Overwritten 


Cache  # 


Figure  19:  Percent  Self  Overwritten  for  Compress 


Cache  # 


Figure  20:  Percent  Self  Overwritten  for  GCC 


Cache  # 


Figure  21:  Percent  Self  Overwritten  for  Espresso 


ocj^to«>ocj<<s-<oooo<Nj*ir«>ooocNj'^<ooo 

T-^-f-i-'r-cgcMCJOJcsjcoeororocr) 

Cache  # 


Figure  22:  Percent  Self  Overwritten  for  Alvinn 


57 


5.4  Impact  on  Cache  Performance 

.  So  far  this  analysis  has  focused  on  the  cache  performance  within  the  context  of  a  single 
program.  The  impact  of  the  operating  system  and  additional  processes  is  also  a  factor  when  the 
aggregate  cache  performance  is  considered,  encompassing  all  references  from  the  trace.  These  results 
are  shown  in  Figures  23  through  30,  which  are  organized  identically  to  the  ones  before.  The  single 
process  simulations  for  each  benchmark  are  again  used  as  a  baseline,  with  the  total  cache  performance 
plotted  for  each  simulation  that  involved  that  benchmark.  Results  from  simulations  with  multiple 
processes  are  shown  in  multiple  figures,  but  because  all  references  are  considered,  the  net  cache 
performance  is  the  same  regardless  of  which  process  is  used  as  the  perspective. 

The  total  miss  rate  is  essentially  a  weighted  average  of  the  miss  rates  of  the  component 
processes,  as  shown  below: 

(1) 

where  M  is  the  total  miss  rate,  rrip  is  the  number  of  misses  for  each  process,  and  Vp  is  the  number  of 
references  for  each  process.  Because  it  is  a  weighted  average,  the  behavior  of  the  total  miss  rate  may 
be  dominated  by  the  miss  rate  behavior  of  one  of  the  component  processes.  A  process  may  dominate 
the  average  because  of  the  number  of  references  it  generates,  such  as  the  combination  of  a  benchmark 
and  its  respective  operating  system  overhead  (which  has  fewer  references).  A  process  may  also 
dominate  the  average  because  of  its  performance.  For  example.  Compress  suffers  from  particularly 
poor  data  cache  performance,  so  any  simulation  involving  Compress  will  have  the  average  data 
cache  performance  dominated  by  Compress’  characteristics.  On  the  contrary,  Compress  also  has  the 
lowest  instruction  cache  miss  rates,  so  the  average  instruction  cache  performance  is  dominated  by 
whatever  process  is  executed  with  Compress.  The  dominant  process  will  define  the  gross  performance 
characteristics  of  the  overall  cache  behavior.  For  instance,  the  miss  rate  fluctuations  as  a  certain 
parameter  varies,  such  as  cache  size. 

The  impact  of  each  benchmark  can  be  seen  by  its  contribution  to  the  total  miss  rate,  but  the 
impact  of  the  operating  system  is  not  as  visible.  Figures  31  and  32  show  the  percent  of  misses  that 
are  due  to  kernel  references  for  instructions  and  data  respectively.  As  can  be  seen,  the  impact  to  the 
data  cache  is  much  more  consistent  than  that  to  the  instruction  cache.  The  instruction  impact  varies 
significantly  depending  on  the  benchmark  in  question  and  the  demands  it  places  on  the  operating 
system.  Cache  design  parameters  can  also  be  a  factor,  as  the  larger  caches  have  a  larger  portion  of 


58 


the  misses  due  to  the  kernel.  This  is  logical  as  the  programs  with  their  larger  footprints  can  take 
advantage  of  the  larger  caches,  while  the  operating  system  with  its  shorter  execution  intervals  may 
never  leave  the  cache  warm  up  phase. 

5.5  Summary 

Based  on  the  evidence  shown  here,  a  few  generalizations  can  be  made  about  the  observed  cache 
performance. 

•  Both  operating  system  and  additional  user  processes  will  significantly  affect  cache  performance, 
with  the  user  programs  generating  the  largest  impact. 

•  For  a  given  process,  the  performance  is  always  degraded  due  to  the  external  interference, 
although  if  the  net  performance  over  multiple  processes  is  considered  it  may  be  better  than 
the  performance  for  just  one  of  the  component  processes  due  to  averaging. 

•  The  primary  source  of  this  performance  degradation  is  in  the  loss  of  temporal  locality.  The 
interference  between  the  various  processes  does  not  affect  each  process’  spatial  locality,  but 
with  frequent  interruptions  in  process  execution  there  is  a  loss  of  temporal  locality  across  each 
interruption. 

•  The  worst  degradation  is  in  caches  which  already  suffered  from  poor  performance. 

•  The  amount  of  degradation  and  any  patterns  it  follows  depends  greatly  on  the  specific  processes 
involved,  and  the  effects  observed  can  vary  greatly.  This  is  due  to  the  differences  in  program 
behavior  such  as  system  demands  (system  calls,  interrupts)  and  footprint  (size,  length,  working 
set). 

•  The  overall  cache  performance  is  an  average  of  the  performance  of  the  component  processes. 
The  individual  process  performance  characteristics  are  interrelated,  so  are  difficult  to  determine 
independently. 

This  is  contrary  to  some  of  the  initial  assumptions  made  in  [1,  2,  3],  which  have  since  been  discarded. 
These  results  are  more  comparable  to  those  found  in  [11,  12,  13]. 


59 


Miss  RaJe  (%)  Miss  Rate  (%)  Miss  Rate  (%) 


Instaiction  References,  A=1 


—  -O-  - 

—  -O  — 
- A — 

—  - 

—  -A  — 
- ■ — 

—  -  O  .  . 

—  ^3-  - 

—  — 

- •— 

.  .  O-  -  - 

—  -o-  - 

—  -o  — 


—  -o-  “ 

- A— 


■  -A  - 
"A— 
-A  - 


-  -  Q  -  . 

—  -Q-  - 

—  K3  — 

- 

-  .  O  -  - 


o  — 


-  .  -0.  .  . 
—  - 

—  -o  — 

- A - 

—  -A—  ” 

—  -A  — 

- ■ - 

.  .  Q  -  - 

—  -C3-  - 

—  Ha  — 

A 

—  -  o  -  ■ 

^  — 


.  alone  {S=4096) 

,  w/  OS  (S=4096) 
w/OS&GCC  (S=4096) 
w/  OS  and  Espresso  (S=409€) 

,  alone  (8=8192) 

,  w/OS  (S=8192) 
w/OS&GCC  (S=8192) 
w/  OS  and  Espresso  (S=8192) 

,  alone  (S=16384) 

,  w/OS  (S= 16384) 
w/OS&GCC  (S=16384) 
w/  OS  and  Espresso  (S=16384) 
,  alone  (S=32768) 
w/  OS  (S=32768) 
w/OS&GCC  (S=32768) 
w/  OS  and  Espresso  (S=:32768) 

alone  (8=4096) 

w/  OS  (S=4096) 

w/OS&GCC  (S=4096) 

w/  OS  and  Espresso  (S=4096) 

alone  (S=8192) 

w/  OS  (S=8192) 

w/  OS  &  GCC  <S=8192) 

w/  OS  and  Espresso  (S=8192) 

alone  (S=16384) 

w/OS  (S= 16384) 

w/OS&GCC  (S=16384) 

w/  OS  and  Espresso  (S=16384) 

alone  (S=32768) 

w/  OS  (S=32768) 

w/  OS  &  GCC  (S=32768) 

w/  OS  and  Espresso  (8=32768) 

alone  (S=4096) 
w/  OS  (S=4096) 
w/  OS  &  GCC  (S=4096) 
w/  OS  and  Espresso  (S=4096) 
alone  (S=8192) 
w/OS(S=8192) 
w/OS&GCC  (S=8192) 
w/  OS  and  Espresso  (S=8192) 
alone  (5=16384) 
w/  OS  (S=16384) 
w/OS&GCC  (S=16384) 
w/  OS  and  Espresso  (S=16384) 
alone  (S=32768) 
w/OS  (S=32768) 
w/OS&GCC  (S=32768) 
w/  OS  and  Espresso  (S=32768) 


Figure  23:  Instruction  Cache  Miss  Rates  With  Compress 


60 


Miss  Rate  (%)  Miss  Rale  (%)  Miss  Rale  (%) 


Data  References,  A=:1 


.  -  ■ . 


. .  -A*  ^ 


" _ A 


- 


^  alone  (S=4096) 

.  .  ^  .  .  w/OS(S=4096) 

_  ,  w/OS&GCC(S=4096) 

^  ^  _  w/  OS  and  Espresso  (S=4096) 

alone  (S=8192) 

.  -  ^  -  .w/OS{S=8192) 

_  -  w/OS&GCC(S=8192) 

_  »  w/ OS  and  Espresso  (S=81 92) 

j  alone  (5=16384) 

-  -  Q.  -  .  w/OS(S=16384) 

_  .  w/  OS  &  GCC  (S=16384) 

•M  .Q  w/ OS  and  Espresso  {S=1 6384) 

0  alone  (S=32768) 

.  -  O*  -  -  w/ OS  (3=32768) 

^  ,  w/OS&GCC(S=32768) 

_  _»  w/  OS  and  Espresso  (S=32768) 


Block  Size  (Bytes) 


Data  References,  A=2 


,  —  -  —  -  —  -  — o  ■ 


^  alone  (S=4096) 

.  .  ^  .  .  w/ OS  (S=4096) 

_  ..0_  .  w/ OS  &  GCC  (S=4096) 

_  ^  ^  w/  OS  and  Espresso  (S=4096) 

^  alone  (S=8192) 

-  .  ^  -  .W/OS  (S=ai92) 

^  «  w/OS&GCC(S=8192) 

^  __  w/ OS  and  Espresso  (S=81 92) 

■  p _ alone  (S=16384) 

-  -  Q.  -  .w/OS(S=16384) 

_  ,  w/OS&GCC(S=16384) 

_  ^  _  w/ OS  and  Espresso  (S=1 6384) 
0  alone  (S=32768) 

-  -  ^  -  .  W/ OS  (S=32768) 

_  .Q_  .  w/OS&GCC(S=32768) 

^  ^  w/  OS  and  Espresso  (S=32768) 


Block  Size  (Bytes) 


Data  References,  A=4 


Ha - 

-a - - -a 

8=: - e-- 


_ © 


alone  (S=4096) 

-  -  ^  -  .w/OS(S=4096) 

_  .  w/OS&GCC(S=4096) 

^  ^  __  w/  OS  and  Espresso  (S=4096) 

^  alone  (S=8192) 

-  .  ^  .  ,w/OS(S=8192) 

__  _ w/OS&GCC(S=8192) 

_  w/ OS  and  Espresso  (S=8 192) 

alone  (S=16384) 

.  -  Q.  -  .w/OS(S=16384) 

_  ,Q_  ,  w/ OS  &  GCC  (S=16384) 

_  ^  _  w/ OS  and  Espresso  (S=1 6384) 
0  alone  (S=32768) 

-  .  O'  -  ,  W/ OS  (S=32768) 

_  .  w/OS&GCC(S=32768) 

__  ^  _  w/  OS  and  Espresso  (S=32768) 


Block  Size  (Bytes) 


Figure  24:  Data  Cache  Miss  Rates  With  Compress 


61 


Miss  Rate  {%)  Miss  Rate  (%)  Miss  Rate  {%) 


Instruction  References,  A=1 


.  alone  (S=4096) 

,  w/  OS  (S=4096) 
w/  OS  &  Compress  (S=4096) 
w/  OS  and  Espresso  (S=4096) 

,  alone  (S=8192) 
w/  OS  {S=8192) 
w/  OS  &  Compress  (S=8192) 
w/  OS  and  Espresso  (S=8192) 
alone  (S=16384) 
w/OS(S=16384) 
w/  OS  &  Compress  (S=16384) 
w/  OS  and  Espresso  (S=16384) 
alone  (5=32768) 
w/  OS  (S=32768) 
w/  OS  &  Compress  (S=32768) 
w/  OS  and  Espresso  (S=32768) 


Block  Size  (Bytes) 

Instruction  References,  A=2 


Instruction  References,  A=4 


Figure  25:  Instruction  Cache  Miss  Rates  With  GCC 


.  alone  (S=4096) 

,  w/  OS  (S=4096) 
w/  OS  &  Compress  (S=4096) 
w/  OS  and  Espresso  (S=4096) 
.alone  (S=8192) 

.  w/  OS  (S=8192) 
w/  OS  &  Compress  (S=8192) 
w/  OS  and  Espresso  (S=8192) 

,  alone  (S=16384) 
w/  OS  (S=16384) 
w/  OS  &  Compress  (S=16384) 
w/  OS  and  Espresso  (S=16384) 
alone  (S=32768) 
w/  OS  (S=32768) 
w/  OS  &  Compress  (S=32763) 
w/  OS  and  Espresso  (S=32768) 


.  -  -o  - 


-o 


-  ^  - 
‘A  - 


■  -D  - 
H3-  - 
■G  - 


■  O  - 
•O-  - 
-O  — 


^  alone  (S=4096) 

.  w/  OS  (S=4096) 

.  w/  OS  &  Compress  (S=4096) 

.  w/  OS  and  Espresso  (S=4096) 
_  alone  (S=8192) 

.  w/  OS  (S=8192) 

.  w/ OS  &  Compress  (S=8192) 

.  w/ OS  and  Espresso  (S=8 192) 
alone  (S=16384) 

.  W/  OS  (S=16384) 

,  w/  OS  &  Compress  (S=16384) 
w/  OS  and  Espresso  (S=16384) 
^  alone  (S=32768) 

.  w/  OS  (5=32768) 

.  w/  OS  &  Compress  (S=32768) 
w/  OS  and  Espresso  (S=32768) 


62 


Miss  Rate  (%)  Miss  Rate  (%)  Miss  Rate  (%) 


Data  References,  A=1 


Data  References,  A=2 


16 


alone  (S=4096) 

w/  OS  (S=4096) 

w/  OS  &  Compress  (S=4096) 

w/  OS  and  Espresso  (S=4096) 

alone  (S=8192) 

w/  OS  {S=8192) 

w/  OS  &  Compress  (S=81 92) 

w/OS  and  Espresso  (S=8192) 

alone  (S=16384) 

w/  OS  {S=16384) 

w/  OS  &  Compress  (S=16384) 

w/  OS  and  Espresso  (S=16384) 

alone  {S=32768) 

w/  OS  (S=32768) 

w/  OS  &  Compress  (S=32768) 

w/  OS  and  Espresso  (S=32768) 


- ^ 

-  -A- 

—  -A  - 


__  alone  (S=4096) 

_  w/  OS  (S=4096) 

.  w/ OS  &  Compress  (S=4096) 
^  w/  OS  and  Espresso  (S=4096) 
^  alone  (S=8192) 

,  w/OS(S=8192) 

_  w/ OS  &  Compress  (S=81 92) 
^  w/OS  and  Espresso  (S=8192) 
alone  (S=16384) 


-  -  Q.  -  .  w/ OS  (S=  16384) 

.  w/OS  &  Compress  (S=1 6384) 
^  .Q  _  w/ OS  and  Espresso  (S=1 6384) 
0  alone  {S=32768) 

-  -  O  -  -  W/  OS  (S=32768) 


_  -  w/  OS  &  Compress  (S=32768) 

^  ^  _  w/  OS  and  Espresso  (S=32768) 


-  o. 


-o 


■  - 

-A  - 


■  a  - 

-D- 

Ci  - 


.  -  o 


—  o  — 


_  alone  (S=4096) 

.  w/  OS  (S=4096) 

.  w/ OS  &  Compress  (S=409€) 

.  w/  OS  and  Espresso  ($=4096) 
^  alone  (S=8192) 

.  w/  OS  (S=8192) 

.  w/ OS  &  Compress  (S=81 92) 

.  w/OS  and  Espresso  {$=8192) 
_  alone  (S=16384) 

^  w/  OS  (S= 16384) 

,  w/ OS  &  Compress  (5=16384) 

.  w/  OS  and  Espresso  (S=16384) 
_  alone  ($=32768) 

.  w/  OS  (S=32768) 

,  w/ OS  &  Compress  (S=32768) 
w/  05  and  Espresso  (5=32768) 


63 


Miss  Rate  (%)  Miss  Rate  (%)  Miss  Rate  {%) 


Instruction  References,  A=1 


Instruction  References,  A=2 


4.5 


alone  (S=4096) 

w/OS  (S=:4096) 

w/  OS  &  Compress  (S=4096) 

w/  OS  and  GCC  (3=4096) 

alone  (S=8192) 

w/OS  (S=8192) 

w/  OS  &  Compress  (S=8192) 

w/OS  and  GCC  (S=8192) 

alone  (S=16384) 

w/  OS  (S= 16384) 

w/  OS  &  Compress  (S=16384) 

w/  OS  and  GCC  (3=16384) 

alone  (S=32768) 

w/  OS  (S=32768) 

w/  OS  &  Compress  (S=32768) 

w/  OS  and  GCC  (S=32768) 


^  alone  (S=4096) 

.  .  ^  .  .w/OS(S=4096) 

_  w/  OS  &  Compress  (S=4096) 
_  ^  w/OS  and  GCC  (S=4096) 


.alone  (S=8192) 


.  .  ^  .  .w/OS(S=8192) 

_  .  w/ OS  &  Compress  (S=81 92) 

_  _  w/OSandGCC(S=8192) 

H .  alone  (S=16384) 

«  «  Q.  _  .  w/OS  (S= 16384) 

^  ,  w/OS  &  Compress  (S=1 6384) 

_  .Q  _  w/OS  and  GCC  (S=1 6384) 


.  alone  (S=32768) 


.  .  O-  -  -  (S=32768) 

_  .  w/ OS  &  Compress  (S=32768) 

_  ^  _  w/OS  and  GCC  (S=32768) 


_  alone  (S=4096) 

.  w/OS  (S=4096) 

..  w/ OS  &  Compress  (S=4096) 
^  w/ OS  and  GCC  (S=4096) 

_  alone  (S=8192) 

.w/OS  (S=8192) 

.  w/ OS  &  Compress  (S=81 92) 
.  w/OS  and  GCC  (S=8 192) 

_  alone  (S=16384) 


.  -  Q.  -  .w/OS(S=16384) 

^  .  w/OS  &  Compress  (S=1 6384) 

_  ^  w/OS  and  GCC  (S=1 6384) 

■  ■  0-.- . —  alone  (S=32768) 

.  .  O'  -  .  w/ OS  (S=32760) 

_  .  w/ OS  &  Compress  (S=32768) 

^  O  —  w/ OS  and  GCC  (S=32768) 


Figure  27:  Instruction  Cache  Miss  Rates  With  Espresso 


64 


Miss  Rate  (%)  Miss  Rale  (%)  Miss  Rate  (%) 


Data  References,  A=1 


21 


alone  (S=4096) 

w/  OS  {S=4096) 

w/  OS  &  Compress  (S=4096) 

w/OS  andGCC  (S=4096) 

alone  (S=8192) 

w/OS(S=8192) 

w/  OS  &  Compress  (S=81 92) 

w/OS  and  GCC(S=8192) 

alone  (S= 16384) 

w/OS  (S= 16384) 

w/  OS  &  Compress  (S=16384) 

w/OS  and  GCC(S=16384) 

alone  (S=32768) 

w/  OS  (S=32768) 

w/  OS  &  Compress  (S=32768) 

w/OS  and  GCC  {S=32768) 


.  alone  (S=4096) 


.  .  ^  .  w/ OS  (S=4096) 

^  -  w/ OS  &  Compress  (S=4096) 

_  ^  _  w/OS  and  GCC  (S=4096) 

^  alone  (S=8192) 

-  ,  ^  -  .  w/OS(S=8192) 

^  .  w/ OS  &  Compress  (S=:8 192) 

_  _  w/OS  and  GCC  (S=81 92) 

jj  alone  (S=16384) 

.  .  Q.  .  ,w/OS(S=16384) 

_  ,  w/ OS  &  Compress  (S=1 6384) 

_  ^  _  w/OS  and  GCC  {S=1 6384) 

^  alone  (S=32768) 


-  -  O*  -  -W/OS(S=32768) 

»  Q  «  w/  OS  &  Compress  (S=32768) 
_  ^  w/ OS  and  GCC  (S=32768) 


alone  (S=4096) 
w/  OS  {S=4096) 
w/  OS  &  Compress  (S=4096) 


w/  OS  and  GCC  (S=4096) 

alone  (8=8192) 

w/  OS  (S=8192) 

w/  OS  &  Compress  (S=81 92) 

w/  OS  and  GCC  (S=81 92) 

alone  ($=16384) 

w/  OS  (S=16384) 

w/  OS  &  Compress  (S= 16384) 

w/OS  and  GCC(S=16384) 


.  alone  (S=32768) 


,  ,  ^  _  w/  OS  (S=32768) 

_  Q  ,  w/  OS  &  Compress  (S=32768) 
_  ^  _  w/ OS  and  GCC  (S=32768) 


65 


32 


64 


Block  Size  (Bytes) 


128 


256 


Figure  29:  Instruction  Cache  Miss  Rates  With  Alvinn 


66 


Data  References,  A=1 


alone  (S=4096) 


Data  References,  A=2 


Data  References,  A=4 


Figure  30:  Data  Cache  Miss  Rates  With  Alvinn 


-  -  ^  -  W/  os  (S=4096) 


„  alone  (S=8192) 


.  .  .  .  w/OS  (S=8192) 


.  alone  (S=16384) 


-  -  Q.  .  _ w/OS (S=1 6384) 


,  alone  <S=32768) 


.  O*  •  -  W/ OS  (S=32768) 


,  alone  (S=4096) 


-  ^  -  .  w/ OS  (S=4096) 


.  alone  (S=8192) 


,  .  ^  .  _w/OS(S=8192) 


.  alone  (S=16384) 


-  -  Q,  .  .  w/OS  (S=1 6384) 


,  alone  (S=32768) 


.  .  ^  .  «  w/ OS  (S=32768) 


,  alone  (S=4096) 


.  .  .0.  .  .  w/ OS  (S=4096) 


.  alone  (S=8192) 


,  ^  ,  .  w/OS(S=8192) 


.  alone  (S=16384) 


-  -  Q.  -  .  w/OS  (S=1 6384) 


.  alone  (S=32768) 


,  ^  .  .  w/ OS  (S=32768) 


%  Misses  From  Kernel 


100 


Cache  # 


Figure  31:  Percent  Instruction  Misses  From  Kernel 


♦ 


_  Compress 
«GCC 
_  Espresso 
__  Backprop 
,  Compress  &  GCC 
_  Compress  &  Espresso 
.  GCC  &  Espresso 


♦ 


Compress 

GCC 

Espresso 

Backprop 

Compress  &  GCC 

Compress  &  Espresso 

GCC  &  Espresso 


Mini 


o> 


I  I  M  I  1  1  M  I  I  M  I 

CMmao*-'«'r»ococo 

Cache  # 

Figure  32:  Percent  Data  Misses  From  Kernel 


o> 


68 


5.6  Future  Work 


With  the  simulations  already  performed,  there  is  still  a  considerable  amount  of  data  analysis 
that  could  be  performed,  as  more  specific  aspects  of  cache  performance  are  considered.  Also,  a 
number  of  improvements  to  the  simulation  program  were  outlined  in  section  4,  which  should  ideally 
be  included  before  any  future  work  is  performed  with  this  tool.  The  most  fundamental  change  should 
be  towards  modeling  more  of  the  memory  system,  to  include  such  aspects  as  traffic  to  memory, 
physical  address  mapping,  write  policies,  and  cache  service  times.  Other  additions  can  be  readily 
made  to  the  cache  simulator  to  study  specific  aspects  of  cache  design,  such  as  alternative  replacement 
algorithms  in  associative  caches,  different  address  hashing  algorithms,  or  pre  fetching  possibilities. 

Other  more  substantial  changes  could  be  made  to  generate  different  forms  of  performance 
data.  One  area  is  analyzing  sampled  cache  performance,  looking  at  cache  performance  over  shorter 
time  periods  to  study  the  effects  of  short  term  working  set  changes.  Another  area  is  tracing  the 
operating  system  in  particular,  capturing  data  from  the  various  kernel  threads  separately,  as  well  as 
determining  the  source  of  system  calls.  Another  possibility  is  to  provide  a  more  detailed  reference 
record  so  that  reference  gap  information  is  available  to  study  interference  patterns  in  more  detail. 
On  the  most  generic  level,  such  a  tool  can  also  be  used  to  generate  traces  for  other  work.  Finally, 
this  research  will  provide  the  background  necessary  for  continued  study  of  the  operating  system 
through  the  development  of  new  ATOM  tools. 


69 


r 

1 


6  Context  Switch  Model 

6.1  Theory 

With  ATOM,  it  is  now  possible  to  generate  simulations  with  a  broader  scope  than  just  a 
single  process.  As  a  commercially  available  tool  with  a  great  deal  of  flexibility,  ATOM  is  simpler 
to  use  than  past  methods,  but  it  still  requires  a  significant  amount  of  additional  time  and  resources 
to  perform  the  cache  analysis.  An  improvement  would  be  to  approximate  the  accuracy  of  a  com¬ 
prehensive  simulation  without  the  additional  effort.  One  possible  method  is  to  develop  a  synthetic 
model  which  would  generate  complex  traces  without  the  execution  of  programs.  Such  a  technique 
would  exercise  the  entire  cache  like  a  real  environment,  but  is  difficult  to  verify  and  is  beyond  the 
scope  of  this  work. 

A  simpler  method  is  to  study  a  single,  more  focused,  aspect  of  cache  performance.  Here  we 
only  consider  the  performance  of  a  single  process,  but  in  the  context  of  a  multi-process  environment, 
similar  to  that  considered  by  Agarwal  in  [3].  Instead  of  an  entire  synthetic  workload,  an  analytical 
model  can  be  used  in  conjunction  with  a  single  process  trace.  In  this  way,  the  cache  behavior  of 
a  single  process  can  be  predicted  more  accurately  with  only  a  simple  simulation.  The  model  is 
responsible  for  injecting  the  desired  multi-process  characteristics  into  the  simulation,  which  can  be 
achieved  through  a  statistical  approach. 

The  simulation  of  a  single  process  will  identify  its  own  characteristics,  and  the  introduction 
of  the  statistical  model  will  incorporate  the  transient  effects  of  a  complex  environment.  This  can 
be  achieved  by  analyzing  the  effect  of  the  operating  system  and  additional  processes  on  a  single 
process,  and  mimicking  this  in  the  simulation  program.  As  will  be  seen,  this  is  essentially  modeling 
context  switch  characteristics  in  the  cache  [31,  41,  56].  Though  it  will  not  be  as  accurate  as  the  full 
simulation,  it  will  be  faster  and  much  easier  to  execute.  For  an  approximate  result,  it  is  much  more 
efficient. 

From  the  perspective  of  a  single  process,  it  is  the  sole  user  of  the  cache  at  any  given  point 
in  time  (assuming  a  uniprocessor  environment).  However,  the  time  the  process  is  actually  being 
executed  is  not  continuous  for  its  entire  lifetime.  The  process  is  instead  broken  up  into  shorter 
continuous  segments  separated  by  context  switches.  Between  these  segments,  operating  system 
routines  or  other  processes  are  being  executed,  which  can  overwrite  some  or  all  of  the  process’  cache 
blocks.  Assuming  all  the  various  processes  are  independent,  these  interruptions  are  transparent  to 


70 


any  single  process  and  each  process  is  not  “aware”  of  the  other  processes  being  executed.  Here  the 
term  interruption  is  used  to  denote  the  time  from  when  a  given  program  is  switched  out  of  execution 
to  the  point  it  is  returned  to  execution.  The  net  effect  to  the  cache  is  that  from  a  specific  program’s 
perspective,  it  is  executed  continuously,  but  at  certain  times  during  its  execution  some  or  all  of  its 
cache  blocks  are  overwritten  or  invalidated.  Figure  33  shows  the  difference  between  this  perspective 
and  the  actual  environment,  showing  a  basic  time  space  diagram  of  process  execution.  This  would 
be  the  condition  in  a  multitcisked  uniprocessor  where  each  thread  or  program  is  considered  to  be  a 
unique  process  with  a  unique  reference  stream. 


Proc  0 
Proc  1 
Proc  2 
Proc  3 


Proc  1 


Time 


This  would  suggest  that  by  modeling  context  switches,  the  gap  between  single  and  multiple 
process  simulations  can  be  bridged.  There  are  basically  two  fundamental  questions  that  must  be 
addressed  by  such  a  statistical  model: 

1.  how  often  the  execution  of  a  program  is  interrupted  by  a  context  switch,  and 

2.  what  is  the  impact  to  the  cache  state  caused  by  this  interruption. 

These  questions  are  not  eeisily  answered.  Timing  of  context  switches  can  depend  on  many  variables 
including  the  physical  system  state,  how  the  system  is  loaded,  and  characteristics  of  the  programs. 
Similarly,  the  impact  will  depend  on  the  state  of  program  execution,  the  amount  of  live  data  present 
in  the  cache,  and  the  amount  of  overlap,  if  any,  between  the  working  sets  of  the  various  programs.  The 
model  will  depend  heavily  on  the  particular  system  involved,  and  must  be  developed  with  both  the 


71 


hardware,  operating  system,  and  test  programs  in  mind.  Once  these  factors  are  understood,  they 
can  be  incorporated  into  the  simulation  program  so  that  simulations  would  theoretically  provide 
results  comparable  to  the  program  being  executed  in  a  realistic  environment  [23]. 


6.2  Development 

The  first  step  in  developing  the  model  is  to  ensure  that  it  is  applicable  to  our  test  system 
[17,  39,  65,  69].  Our  Alpha  based  system  meets  the  criteria  described  above.  It  is  a  single  processor 
machine  running  OSF,  which  can  execute  multiple  processes  on  a  timesharing  basis.  Instructions 
and  data  can  be  shared  between  processes,  but  their  dependence  can  be  minimized  by  choosing 
appropriate  test  programs.  The  impact  of  the  test  platform  on  the  traces  is  assumed  to  be  consistent 
across  all  simulations  and  is  ignored.  The  references  generated  are  64  bit  virtual  addresses  in  a 
continuous  address  space,  so  no  adaptation  of  the  simulation  model  is  necessary. 

Understanding  the  operating  system  is  the  most  important  aspect  of  developing  the  model 
[4,  9,  18,  70,  72,  71].  The  operating  system  both  generates  its  own  set  of  references,  as  well  as  controls 
the  scheduling  of  the  other  reference  streams.  The  OSF/1  operating  system  is  a  threaded  collection 
of  processes  which  includes  system  calls,  interrupt  handlers,  and  other  overhead  management /control 
routines.  These  can  be  modeled  simply  as  a  collection  of  additional  processes  of  varying  length  that 
are  executed  at  random  intervals.  The  processes  are  switched  in  and  out  of  execution  just  like  the 
test  programs.  The  priority  of  these  processes  would  require  that  they  occur  at  any  time,  preempting 
the  execution  of  the  test  process.  The  various  threads  that  make  up  the  kernel  are  not  independent, 
and  may  share  substantial  amounts  of  data.  By  considering  the  threads  of  the  kernel  collectively 
cLS  the  operating  system  overhead,  as  was  done  in  the  earlier  simulations,  the  model  can  neglect 
this  shared  data  with  minimal  loss  of  accuracy.  The  remaining  issue  is  the  degree  of  data  sharing 
between  the  program  and  the  operating  system,  which  is  difficult  to  pinpoint.  For  the  purpose  of  this 
model,  this  dependence  is  assumed  to  be  minimal  and  is  neglected,  which  is  a  reasonable  assumption 
for  the  choice  of  benchmarks.  Any  simulation  of  threaded  programs  or  other  programs  which  use 
substantial  cross  process  communication  cannot  use  these  simplifying  assumptions. 

Given  that  this  type  of  model  is  applicable  to  the  simulations  already  performed,  our  next 
task  is  to  analyze  the  system  and  program  characteristics  to  define  the  model’s  structure.  A  context 
switch  mechanism  must  be  introduced  into  the  simulation,  and  the  effects  of  each  interruption  in 
execution  incorporated  appropriately. 


72 


6.3  Implementation 


One  of  the  most  basic  forms  of  modeling  multiprocessing  is  to  totally  flush  the  cache  at 
regular  intervals,  modeling  the  effect  of  context  switches  between  processes  executing  in  a  round 
robin  fcishion  [3,  21,  56].  This  is  realistic  for  a  virtually  addressed  cache  without  process  identifiers, 
and  a  reasonable  approximation  for  a  small  cache  when  a  context  switch  will  probably  overwrite 
all  data,  but  not  appropriate  for  larger  caches  when  data  survival  is  likely.  A  more  accurate  and 
versatile  model  is  necessary,  but  will  be  more  complex. 

For  a  model  to  be  effective,  however,  it  cannot  be  so  complex  that  direct  simulation  becomes 
a  better  alternative.  If  a  detailed  description  of  the  test  program  is  required  just  to  develop  the 
model,  then  simulation  may  be  just  as  effective.  It  is  also  important  that  the  model  directly  relates 
to  the  system  it  represents.  In  [31],  a  very  comprehensive  model  is  developed.  Unfortunately,  it 
requires  a  thorough  analysis  of  the  program  trace  to  define  the  model  parameters,  thus  limiting  its 
usefulness.  Also,  it  fails  to  consider  some  very  basic  variations  in  cache  architecture.  A  balance  is 
necessary,  the  model  must  be  complex  enough  to  be  accurate,  but  based  on  basic  properties  of  the 
system  and  programs  that  are  easily  observed.  With  this  in  mind,  the  model  can  be  developed  by 
answering  the  two  questions  mentioned  above. 

6.3.1  Frequency 

The  answer  to  the  first  question  is  based  on  the  execution  interval  of  a  program,  or  how 
long  it  is  executed  before  a  context  switch  occurs.  This  is  heavily  dependent  on  how  execution  is 
scheduled,  which  is  controlled  by  the  operating  system  [19].  A  process  is  executed  until  it  either 
is  switched  voluntarily  (i.e.,  while  it  waits  for  some  system  resource,  or  requests  a  system  call), 
it  is  preempted  by  a  higher  priority  process  (i.e.,  an  interrupt  service  routine),  or  it  is  switched 
involuntarily  for  another  user  process  (i.e.,  the  end  of  a  fixed  time  allocation  is  encountered).  The 
initial  priority  of  a  process  depends  on  its  type  (system  versus  user)  and  its  requirements  (interactive 
versus  compute  intensive).  The  priority  can  degrade  while  the  process  is  being  executed  and  is 
promoted  while  it  is  stalled,  which  prevents  a  single  process  from  dominating  the  system  resources. 
In  a  fixed  priority  scheme,  processes  of  equal  priority  are  processed  according  to  a  policy,  either  first 
in  first  out  (the  program  executes  until  completion)  or  round  robin  (programs  are  switched  after  a 
fixed  interval,  taking  turns)  [16,  71].  The  time  sharing  in  OSF/1  is  on  a  thread  basis,  however  the 


73 


test  programs  are  all  single  threaded,  and  the  various  threads  of  the  operating  system  are  considered 
as  a  conglomerate  from  the  cache’s  perspective. 

For  the  model,  we  use  a  basic  scheme  based  on  this  information.  We  assume  that  all 
operating  system  level  processes  have  a  higher  priority  than  any  test  program  process,  so  they  can 
interrupt  test  program  execution  at  any  time.  These  processes  will  include  both  interrupt  service 
routines  and  system  calls.  All  test  programs  run  at  the  same  priority,  with  a  round  robin  scheduling. 
For  a  single  program,  this  defines  the  characteristics  of  its  execution  interval.  The  interval  has  some 
maximum  value  where  a  context  switch  is  automatic,  but  up  to  that  point  there  is  some  probability 
that  a  switch  will  occur  earlier  due  to  either  an  interrupt,  system  call,  or  stall  waiting  for  resources. 
Based  on  results  from  previous  studies  [8,  31,  41],  this  probability  follows  an  exponential  distribution. 
Most  processes  execute  for  a  short  interval;  with  an  exponential  reduction  so  very  few  processes 
consume  the  maximum  interval  —  showing  that  context  switches  are  a  regular  occurrence.  With 
round  robin  scheduling,  the  number  of  test  programs  considered  in  the  model  does  not  affect  the 
execution  interval. 

To  incorporate  this  fact  into  the  model,  a  random  variable  R  is  defined  representing  the 
execution  interval  length  in  number  of  references  r  with  an  exponential  probability  density  function. 
A  distribution  of  this  kind  has  the  form  [53]: 

/(r)  =  (2) 

where  ^  is  a  constant  which  defines  the  shape  of  the  curve  and  its  expected  value.  The  probability 
that  any  given  reference  interval  R  will  be  r  references  or  less  is  defined  by: 

P[R  <r]=  f  f{r)dr  =  1  --  (3) 

«/  —  oo 

If  we  assume  that  an  interval  will  be  as  long  as  possible,  then  this  can  be  used  as  the 
probability  that  a  given  execution  interval  R  is  r  references  long,  expressed  as: 

p=l-el?  (4) 

This  function  could  be  incorporated  into  the  program  by  determining  the  probability  of  a  given 

interval  as  that  reference  is  reached.  A  random  number  in  [0..1]  is  then  generated  at  each  reference 
to  determine  if  a  switch  is  necessary.  A  better  solution  is  to  invert  the  equation  to  yield: 

r  =  -Mln(l-p)  (5) 


74 


Thus  generating  a  random  number  in  [0..1]  will  generate  an  appropriate  execution  interval  length  r 
(rounded  to  an  integer  value),  as  shown  in  Figure  34. 


Figure  34:  Execution  Interval  Given  Some  Probability  [0..1] 


The  remaining  unknown  is  /x,  which  can  be  determined  by  defining  the  desired  maximum 
execution  interval.  In  [8,  41]  this  was  400,000  traced  instructions,  or  25,000  untraced,  although  these 
values  based  on  a  system  that  is  no  longer  contemporary.  If  we  assume  that  each  program  executes 
for  a  maximum  10  ms  time  slice  on  a  system  with  a  20  ns  cycle  and  average  of  2  cycles  used  per 
instruction  [71],  this  generates  a  maximum  interval  of  250,000  references: 

_ _ =  250, 000  (6) 

(2 _ _ )(20e  -  9  interval 

V  instruction'^  cycle  * 

At  this  point,  the  probability  of  a  context  switch  defined  above  should  approach  1,  or 


lim  6/^=0  (7) 

r—*T  mets 

Obviously  this  cannot  be  exact,  but  selecting  a  /x  of  or  50000,  is  accurate  to  0.006738  which  is 
sufficient  for  this  application.  Since  the  exponential  function  cannot  define  the  maximum  value,  an 
explicit  limit  is  set  on  the  function,  so  that  the  final  definition  of  each  execution  interval  is  given  by: 


r  =  min(-50000ln(l  -  p),  250000) 


(8) 


which  is  the  function  used  to  generate  Figure  34. 

Incorporating  this  into  software,  at  program  start  and  after  every  context  switch,  a  random 
value  is  generated  in  [0..1].  This  is  applied  to  the  above  function  to  determine  the  execution  interval. 
A  counter  is  maintained  of  the  number  of  instruction  references  since  the  last  context  switch,  and 
when  these  two  values  are  equal,  the  switch  impact  model  discussed  below  is  performed.  The  actual 
distribution  generated  by  the  random  function  is  shown  in  Figure  35,  showing  the  probability  of  a 


75 


specific  interval  determined  by  the  number  of  intervals  out  of  250,000,000  generated.  The  probability 
of  any  particular  interval  is  low,  but  the  cumulative  probability  of  a  context  switch  as  the  interval 
increases  to  its  maximum  value  approaches  1  as  expected.  The  spike  at  250000  references  is  due  to 
the  limit  in  the  function,  and  is  negligible  in  the  cumulative  distribution. 

0.007 


6.3.2  Impact 

The  second  question  addresses  the  likelihood  that  data  in  the  cache  is  overwritten  by  the 
processes  executed  during  the  interruption.  As  stated  before,  simply  invalidating  the  entire  cache 
is  not  a  realistic  model.  Instead,  the  model  must  take  into  account  the  footprints  of  all  processes 
executed  during  the  interruption  to  determine  what  portion  of  the  cache  is  overwritten.  This  is 
addressed  by  both  Agarwal  [3]  and  Thiebaut  and  Stone  [56].  Both  models  attempt  to  evaluate  all 
aspects  of  the  cache  analytically.  By  using  simulations,  much  of  the  model  can  be  discarded.  Instead, 
only  the  relevant  function  regarding  the  probability  of  cache  line  replacement  is  used.  Both  papers 
use  identical  functions  to  determine  the  probability  that  a  program’s  working  set  will  have  a  certain 
number  of  unique  references  to  a  given  cache  line.  The  derivation  of  this  function  is  quite  lengthy, 
for  more  information  please  consult  either  paper.  It  is  based  on  the  binomial  probability  that  any 
given  cache  reference  will  be  assigned  to  a  certain  cache  line. 

The  calculation  is  a  function  of  the  number  of  cache  lines  N,  the  cache  associativity  A, 
and  the  footprint  F  of  the  interruption,  defined  as  the  number  of  unique  blocks  referenced  by  the 
program  in  the  interval  under  consideration.  The  probability  that  a  given  cache  line  will  contain  i 
references  from  a  certain  footprint  is  defined  as; 

if  0  <  i  <  A  : 


76 


(9) 


if  i  =  A  : 

The  probability  that  a  certain  number  of  blocks  will  be  used  on  any  given  line  directly  determines 
the  probable  number  of  blocks  that  must  be  evicted  from  that  line  during  the  interruption. 

Unfortunately,  this  function  cannot  be  inverted  to  give  a  direct  calculation  of  the  number  of 
blocks  overwritten  in  each  line  based  on  a  single  variable  in  [0..1].  Instead,  a  random  probability  p 
is  generated  for  each  line  in  each  cache  and  the  following  algorithm  is  used  to  iterate  over  all  values 
of  a  in  the  range  [0..-4  —  1]  to  determine  the  number  of  overwrites  to  be  performed  on  that  line: 

a 

if  p  >  ^2  Pi  j  a  -f  1  overwrites  are  performed  (12) 

1=0 

Based  on  [56],  the  overwrites  caused  by  this  function  follow  a  roughly  normal  distribution. 
Figures  36  and  37  show  the  probability  of  n  overwrites  per  line,  P(n),  for  a  context  switch  with 
interruption  footprints  of  100  and  1000  respectively.  Various  associativities  and  their  possible  re¬ 
placements  are  shown,  with  the  replacement  probability  plotted  against  the  number  of  lines  in  the 
cache  —  showing  the  decreasing  likelihood  of  replacement  as  cache  size  increases  or  footprint  size 
decreases. 

Certain  assumptions  apply  to  the  formulas  provided  in  the  papers.  These  equations  assume 
that  a  program’s  footprint  is  uniformly  distributed  over  the  cache.  The  locality  in  reference  streams 
would  suggest  that  this  is  not  true,  which  was  supported  by  the  results  in  both  papers.  Using 
other  mapping  algorithms  (hashing),  it  may  be  possible  to  get  a  more  uniform  distribution,  but  this 
technique  was  not  used.  Finally,  shared  references  between  programs  are  neglected.  As  discussed 
before,  given  the  test  programs  used  and  the  way  the  kernel  is  considered,  this  is  a  reasonable 
assumptions.  To  analyze  a  threaded  program,  or  one  with  a  substantial  shared  component  (such  as 
a  database),  such  an  assumption  is  not  valid. 


77 


Percent  Chance  of  Replacement  Percent  Chance  of  Replacement 


other  assumptions  made  in  the  papers  are  no  longer  relevant.  The  use  of  LRU  replacement 
is  assumed  in  the  analytical  model,  but  incorporated  explicitly  in  simulation.  The  LRU  blocks  are 
selected  for  overwrite,  but  other  selection  methods  are  possible.  Also,  other  considerations  such  as 
which  cache  lines  present  at  a  context  switch  will  be  referenced  after  the  interruption  period  do  not 
have  to  be  modeled,  since  they  are  determined  by  the  simulation. 

The  remaining  problem  is  determining  the  footprint  of  the  interruption.  The  footprint 
depends  on  the  process  being  considered,  its  state  of  execution,  and  the  line  size  of  the  cache,  so  is 
very  ditEcult  to  characterize.  In  [3,  56]  detailed  analyses  of  program  traces  were  used  to  determine 
this  value.  This  is  not  compatible  with  our  goal  of  minimal  analysis  in  developing  the  model,  so 
a  different,  more  improvised,  approach  is  used.  Based  on  the  footprint  values  used  in  other  work 
[3,  56],  a  reasonable  (though  less  accurate)  range  can  be  achieved  using: 


p.  - 

"  50*  B 

(13) 

„  _  Tint 

(14) 

which  gives  the  instruction  footprint  as  2%  of  the  execution  interval  of  the  interruption  (r^nt)  divided 
by  the  block  size  (B)  in  words  (or  in  bytes  divided  by  4),  and  the  data  footprint  is  simply  2%  of  the 
execution  interval.  This  is  obviously  an  overly  simplified  approach  to  characterizing  the  footprint, 
but  adequate  for  an  initial  review.  For  a  unified  cache,  the  two  footprints  are  simply  summed,  which 
is  correct  assuming  independence  of  instruction  and  data  references  (no  self  modifying  code).  For  a 
range  of  intervals  [0. .250,000],  this  produces  a  footprint  range  of  [0..5625]  for  the  caches  simulated. 

The  execution  interval  of  the  interruption  is  computed  as 

Tint  =  n*  -Mln(l  -p)  (15) 

where  n  is  the  number  of  additional  processes  being  executed  according  to  the  model  and  p  is  a 
random  value  in  [0,.l]  as  used  before.  This  is  consistent  with  the  round  robin  scheduling,  as  the 
number  of  processes  being  executed  determines  the  length  of  interruption.  One  problem  is  that  the 
models  used  in  both  [3,  56]  neglect  the  operating  system.  For  simplicity,  the  operating  system  is 
modeled  as  just  another  process:  to  simulate  a  process  with  the  operating  system,  n  =  1;  with  the 
operating  system  and  one  other  process,  n  =  2;  and  so  on.  This  may  be  pessimistic,  as  one  might 
expect  that  system  calls  and  interrupt  service  routines  to  be  shorter  than  user  programs,  however 
the  distribution  of  execution  intervals  is  weighted  towards  shorter  intervals,  which  is  consistent  with 
frequent  interruptions. 


79 


The  impact  is  applied  in  software  every  time  a  context  switch  is  indicated.  The  length  of 
the  interruption  is  computed,  which  in  turn  defines  the  footprint  for  the  various  unified,  instruction, 
and  data  caches.  This  is  used  to  calculate  the  probability  that  a  given  number  of  cache  blocks 
are  overwritten  for  each  cache  line  in  each  different  cache  configuration.  Then  for  each  cache  line 
a  random  number  in  [0..1]  is  generated  and  compared  to  the  probability  to  determine  how  many 
blocks  on  that  line  (up  to  the  set  size)  are  invalidated. 

6.4  Testing 

The  mechanism  described  above  was  incorporated  into  the  same  program  used  for  the  single 
processes  simulations  described  in  section  5.  The  additional  code  is  also  included  in  appendix  A. 
Again  a  tool  was  defined  to  instrument  the  test  programs  (called  mod)  so  shared  library  functions 
could  be  used  in  analysis.  Simulations  with  the  model  were  performed  using  the  same  40  caches 
on  all  four  benchmarks  for  n  =  1,  modeling  the  program  with  the  operating  system.  Simulations 
were  also  performed  for  n  —  2  for  Compress,  GCC,  and  Espresso,  to  compare  the  model  results  to 
simulations  of  two  concurrent  processes  with  the  operating  system.  All  simulations  were  performed 
on  the  same  Alpha  system  as  before.  The  results  of  the  model  simulations  are  reviewed  in  the  next 
section,  and  compared  with  their  equivalent  ”real”  simulations. 


80 


7  Model  Evaluation 

7.1  Individual  Results  for  n=l 


The  accuracy  of  the  context  switch  model  can  be  seen  in  its  ability  to  predict  cache  miss 
rates  commensurate  with  those  generated  from  an  equivalent  “real”  simulation.  The  first  test  case 
was  for  n=l,  modeling  the  test  program  with  one  additional  process,  the  operating  system,  which 
was  performed  for  Compress,  GCC,  Espresso,  and  Alvinn.  The  results  of  these  simulations  are 
plotted  against  the  corresponding  real  simulation  of  each  program  with  the  operating  system,  shown 
in  Figures  38  to  41. 

As  can  be  seen,  the  model  generally  provides  an  adequate  mechanism  for  predicting  the 
interference  caused  by  operating  system  overhead.  There  are  some  variations  over  the  results, 
although  certain  instances  such  as  Alvinn  data  references  are  quite  accurate.  Such  variations  are 
to  be  expected  given  the  assumptions  that  were  used  to  generate  the  model.  The  only  significant 
fluctuations  occur  for  Compress,  which  is  logical  considering  that  benchmark  interacts  substantially 
more  with  the  operating  system  than  the  others. 

7.2  Individual  Results  for  n=2 

A  better  test  of  the  model  is  for  n=:2,  modeling  the  effects  of  the  operating  system  and 
an  additional  process  on  the  performance  of  the  test  program.  Simulations  were  performed  for 
Compress,  GCC,  and  Espresso;  Alvinn  Wcis  neglected  since  no  corresponding  real  simulation  could 
be  performed.  These  results  are  shown  in  Figures  42  to  44. 

These  results  show  the  weakness  of  the  model.  In  almost  every  case,  the  model  predictions 
are  more  optimistic  than  the  real  data.  Also,  the  model  does  not  account  for  differences  in  program 
behavior,  so  while  there  are  two  sets  of  real  data  from  two  alternative  second  programs,  the  model 
only  predicts  a  single  result.  Based  on  this,  the  model  does  not  accurately  predict  the  amount  of 
interference  generated  from  multitasking.  The  error  in  the  model  should  also  be  more  pronounced 
as  the  level  of  multitasking  is  increased,  but  no  simulations  could  be  performed  with  3  test  programs 
or  more  to  verify  this. 

7.3  Interference  Comparison 

The  primary  source  of  error  in  the  model  is  apparent  in  the  interference  plots.  These  are 
equivalent  to  the  interference  figures  of  the  previous  results,  showing  what  percentage  of  cache  misses 


81 


Instruction  References,  A=1 


.  w/  OS  {S=4096) 


Instruction  References,  A=1 


.  .  ^  .  w/ os  &  GCC  (S=4096) 

_  -  w/  OS  &  Espresso  (S=4096) 

■  ^  w/  model  (s=4096) 

-  -  -  .w/OS&GCC(S=8192) 

^  -2^  -  w/  OS  &  Espresso  (S=8192) 

^  w/  model  (Ss=81 92) 

-  -  Q.  -  .w/OS&GCC(S=16384) 

^  _|-j_  _  w/  OS  &  Espresso  (S=16384) 

j  w/  model  (S= 16384) 

.  .  ^  .  w/ OS  &  GCC  (S-32768) 

^  .0~  “  OS  &  Espresso  (S=32768) 
0  w/  model  (S=32768) 


Instruction  References,  A=4 


Data  References,  A=1 


30 


Data  References,  A=4 


18 


.  .  ^  .  w/ OS  &  GCC  {$=4096) 

__  -  w/  OS  &  Espresso  (S=4096) 


_  w/  model  (s=4096) 


.  .  ^  .  .w/ OS  &GCC  (3=8192) 

»  •  w/ OS  &  Espresso  (5=8192) 

jlll^  w/  model  (S=0192) 

-  ,  Q.  -  „  w/ OS  &  GCC  (S=16384) 

^  -  w/  OS  &  Espresso  (S=16384) 

. U|  w/  model  (S=16384) 

.  -  -  .  w/ OS  &  GCC  (S=32768) 

nO"*  *“  OS  &  Espresso  (S=32768) 
■■  .  w/  model  (S=32768) 


. 4-"" 

--yt- 

—  -A- 
- A— 

—  -  O  - 

—  -Cl- 


--0- 


.  w/  OS  &  GCC  (S=4096) 

.  w/  OS  &  Espresso  (S=4096) 
_  w/  model  (s=4096) 
.w/OS&GCC(S=8192) 

.  w/  OS  &  Espresso  (S=8192) 
_  w/ model  (S=81 92) 

.  W/  OS  &  GCC  (S=16384) 

.  w/  OS  &  Espresso  (S=16384) 
_  w/  model  (S=1 6384) 

_  w/  OS  &  GCC  (S=32768) 

,  w/  OS  &  Espresso  (S=32768) 
_  w/  model  (5=32768) 


♦ 

—  -A- 

- A— 

.  -  Q.  - 

— 


.w/OS&GCC  (S=4096) 
w/  OS  &  Espresso  (S=4096) 

,  w/  model  (s=4096) 

.  w/OS&GCC  (S=8192) 
w/  OS  &  Espresso  (S=8192) 

,  w/  model  (S=8192) 
w/  OS  &  GCC  (S= 16384) 
w/  OS  &  Espresso  (S=16384) 
w/  model  (S=16384) 
w/  OS  &  GCC  (S=32768) 
w/  OS  &  Espresso  (S=32768) 
w/  model  (S=32768) 


86 


Miss  Rale  (%)  Miss  Rate  (%)  Miss  Rale  (%)  Miss  Rate  (%) 


Instruction  References,  A=1 


Instruction  References,  A=4 


Data  References,  A=1 


16 


Data  References,  A=4 


„  _  ^  ^  ,  w/  os  &  Compress  (S=4096) 
_  _  w/  OS  &  Espresso  (S=4096) 


,  w/  model  (s=4096) 


.  .  m  .  w/  OS  &  Compress  {S=8192) 
^  .  w/ OS  &  Espresso  (S=81 92) 

^  ■  w/  model  (S=8192) 

_  „  Q,  .  „  w/ OS  &  Compress  (S= 16384) 

_  .  w/ OS  &  Espresso  (S=1 6384) 

j|.  ■■  w/  model  (S=16384) 

.  -  ^  .  ,  w/  OS  &  Compress  (S=32768) 
_  w/  OS  &  Espresso  (S=32768) 
^  w/  model  {S=32768) 


.  _  ^  .  .  w/  OS  &  Compress  (S=4096) 
_  .  w/  OS  &  Espresso  (S=4096) 

^  w/  model  (s=4096) 

.  .  ^  .  ,  w/  OS  &  Compress  (S=8192) 
m/^mm  .  w/ OS  &  Esprosso  (S=81 92) 


w/ model  (S=8192) 

w/  OS  &  Compress  (S=16384) 

w/  OS  &  Espresso  (S^: 16384) 

w/ model  (S= 16384) 

w/  OS  &  Compress  (S=32768) 

w/  OS  &  Espresso  (S=32768) 

w/  model  (S=32768) 


„  .  _  _  w/ OS  &  Compress  (S=4096) 

_  .  w/  OS  &  Espresso  (S=4096) 


♦  . 

-  -  'A'  -  - 

—  -A"  - 

- - 

"  HO-  - 


.  -  O  -  - 


w/  model  {s=4096) 

w/  OS  &  Compress  (S=8192) 

w/OS  &  Espresso  (S=8192) 

w/ model  (S=8192) 

w/  OS  &  Compress  {S=:16384) 

w/  OS  &  Espresso  (S= 16384) 

w/ model  (S=16384) 

w/  OS  &  Compress  (S=32768) 

w/  OS  &  Espresso  (S=32768) 


w/  model  (S=32768) 


_  _  _  w/  OS  &  Compress  (S=4096) 


—  -o- 
— #— 

—  -A— 

— A— 

—  -  o  - 

—  -n- 


.  w/ OS  &  Espresso  (S=4096) 
__  w/  model  (s=4096) 

_  w/  OS  &  Compress  (S=8192) 

_  w/  OS  &  Espresso  (S=8192) 
^  w/  model  (S=8192) 

_  w/  OS  &  Compress  (S= 16384) 
.  w/ OS  &  Espresso  (S=1 6384) 
_w/ model  (S=16384) 

.  w/  OS  &  Compress  (S=32768) 
.  w/  OS  &  Espresso  (S=32768) 
^  w/  model  (S=32768) 


87 


Instruction  References,  A=1 


2.5 


Instruction  References,  A=4 


Data  References,  A=1 


14 


Data  References,  A=4 


^  ,  w/  os  &  Compress  (3=4096) 

. .  ,  w/ OS  &  GCC  (S=4096) 


- •— 

—  -A- 

- A— 

-  -  O  - 
“H3- 


-  -  O  - 


_  w/  model  (s=4096) 

_  w/  OS  &  Compress  (S=8192) 

.  w/OS&GCC(S=8192) 

_w/ model  (S=81 92) 

_  w/  OS  &  Compress  (S=16384) 
.  w/OS&GCC(S=16384) 
^w/ model  (S= 16384) 

,  w/  OS  &  Compress  (S=32768) 


w/  OS  &  GCC  (S=32768) 
w/  model  (S=32768) 


- A— 

"■  ••  “  - 

^  -A-  - 

- A - 

.  -  O  -  - 

—  -o-  - 


-  -  o  -  - 


w/  OS  &  Compress  (S=4096) 
w/OS&GCC  (S=4096) 
w/  model  (s=4096) 
w/  OS  &  Compress  (S=8192) 
w/ OS  &  GCC  (S=8192) 
w/  model  (S=8192) 
w/  OS  &  Compress  (8=16384) 
w/  OS  &  GCC  (S=16384) 
w/  model  (S=16384) 
w/  OS  &  Compress  (S=32768) 
W/  OS  &  GCC  (S=32768) 
w/  model  (S=32768) 


w/  OS  &  Compress  (S=4096) 

w/  OS  &  GCC  (S=4096) 

w/  model  (s=4096) 

w/  OS  &  Compress  (S=8192) 

w/OS&GCC(S=8192) 

w/ model  (S=8192) 

w/  OS  &  Compress  (S=16384) 

w/  OS  &  GCC  (S=16384) 

w/ model  (S= 16384) 

w/  OS  &  Compress  (S=32768) 

w/  OS  &  GCC  (S=32768) 

w/  model  (S=32768) 


♦ 

—  -A—  - 

- A - 

—  -  O  -  - 

—  -o-  - 


-  -  o  -  - 


w/  OS  &  Compress  (S=4096) 
w/  OS  &  GCC  (S=4096) 
w/  model  (s=4096) 
w/  OS  &  Compress  (S=8192) 
w/  OS  &  GCC  (S=8192) 
w/  model  (S=8192) 
w/  OS  &  Compress  (S= 16384) 
w/OS&GCC  (S=16384) 
w/  model  (S=16384) 
w/  OS  &  Compress  (S=32768) 
w/  OS  &  GCC  (S=32768) 
w/  model  (S=32768) 


88 


overwrote  a  process’  own  data  («  intrinsic  interference),  as  opposed  to  overwriting  another  processes 
data  («  extrinsic  interference).  These  plots  are  shown  for  each  of  the  seven  test  cases  in  Figures  45 
through  51. 

As  can  be  seen,  the  model  underestimates  the  amount  of  extrinsic  interference  present  in  a 
multitasked  situation.  With  a  second  program  in  the  model,  the  primary  source  of  interference  is 
still  intrinsic,  as  seen  by  the  percentage  of  self  overwrites,  which,  based  on  the  previous  results,  is 
inaccurate.  The  only  instances  the  model  is  even  remotely  correct  is  for  the  largest  caches  for  GCC 
and  Espresso. 

Given  the  fact  that  the  operating  system  is  modeled  fairly  accurately,  but  the  impact  for 
other  programs  is  not,  the  most  likely  source  of  error  is  in  the  impact  to  the  cache  at  each  context 
switch.  The  switch  frequency  is  assumed  to  be  more  accurate.  This  is  also  supported  by  the  assump¬ 
tions  used  to  develop  the  model.  The  most  likely  source  of  error  is  the  footprint  characterization. 
Using  a  simple  function  of  the  interruption  interval  is  obviously  an  oversimplification.  A  more  ac¬ 
curate  model  could  be  developed  by  using  a  more  flexible  model  of  footprint  size  and  composition 
based  on  program  features. 

7.4  Summary 

Based  on  the  above  results,  the  model  described  in  section  6  does  not  adequately  intro¬ 
duce  the  impact  of  context  switches  into  a  single  process  simulation.  The  interference  generated 
approaches  the  level  caused  by  the  operating  system,  but  is  not  significant  enough  to  represent  ad¬ 
ditional  user  programs.  Given  the  assumptions  used  to  develop  the  model,  the  most  likely  source 
of  error  is  in  the  realization  of  context  switch  impact,  in  particular  the  computation  of  the  program 
footprint.  The  method  used  was  overly  simplified,  especially  the  relationship  between  block  size  and 
program  footprint. 

The  difficulty  of  developing  an  accurate  context  switch  model  highlights  the  complexity  of 
the  cache  environment.  Cache  performance  is  an  intricate  subject,  and  some  aspects  are  not  well 
understood.  Analytical  models  can  facilitate  evaluation,  but  at  the  expense  of  accuracy.  Any  model 
will  have  to  find  a  balance  between  these  two  goals.  The  requirement  for  accuracy  reaffirms  the  need 
for  analysis  tools  as  described  earlier,  despite  their  own  limitations. 


89 


Cache  # 


Figure  45:  Percent  Self  Overwritten  for  Compress;  n=l 


100 


Cache  # 


Figure  46:  Percent  Self  Overwritten  for  GCC;  n=l 


Cache  # 

Figure  47:  Percent  Self  Overwritten  for  Espresso;  n=l 


100 


Cache  # 


Figure  48:  Percent  Self  Overwritten  for  Alvinn;  n=l 


90 


7.5  Future  Work 


While  the  model  was  not  particularly  successful  in  predicting  interference,  it  does  provide 
a  theoretical  foundation  for  further  exploration.  As  discussed  above,  the  primary  limitation  is  the 
simplistic  treatment  of  process  footprints.  Were  this  to  be  resolved  and  the  footprints  consider  both 
the  program  in  question  and  the  cache  block  size,  the  model  should  perform  much  better. 

Other  potential  improvements  are  a  more  detailed  characterization  of  the  operating  system, 
to  include  its  various  composite  threads.  Also,  the  footprint  of  the  operating  system  processes  must 
be  considered  differently  than  user  programs,  due  to  their  unique  nature.  The  execution  interval 
function  can  also  be  improved,  by  including  specific  program  characteristics  such  as  the  frequency 
of  system  calls  and  interrupts  generated  by  that  particular  program.  Finally,  additional  aspects  of 
the  various  existing  analytical  models  can  be  incorporated  to  further  simplify  the  simulations.  A 
better  understanding  of  the  execution  environment  will  allow  more  realistic  assumptions  to  be  used 
in  that  case. 


92 


8  Conclusions 


The  primary  thrust  of  this  research  was  the  development  and  refinement  of  the  ATOM  based 
simulation  capability  for  a  complex  workload.  This  was  accomplished  through  the  development  of  a 
very  flexible  and  robust  analysis  program.  This  program  is  based  on  standard  simulation  tools,  but 
incorporates  novel  techniques  to  allow  a  more  comprehensive  analysis.  Partially  based  on  the  current 
work  of  others,  many  of  these  techniques  still  required  extensive  test  and  adaptation  before  their 
performance  was  adequate.  Other  areas,  such  as  re-entrant  analysis,  were  totally  original.  Several 
avenues  of  future  work  have  also  been  highlighted,  based  on  developing  this  work  into  an  even  more 
mature  tool. 

The  cache  simulations  were  performed  as  a  demonstration  of  the  overall  potential  of  the 
simulation  capability,  as  well  as  reinforcing  assumptions  about  cache  performance  with  operating 
system  overhead  and  in  the  multiprocess  environment.  The  context  switch  model  attempted  to 
combine  both  empirical  and  theoretical  understanding  of  caches,  and  the  testing  portrayed  a  specific 
application  of  the  ATOM  tools  created.  These  results  were  generally  consistent  with  past  endeavors, 
although  highlighted  some  possible  deficiencies  in  current  methods  and  assumptions.  The  execution 
environment  is  quite  complex,  and  aspects  of  its  behavior  are  not  particularly  well  understood. 
The  ATOM  tool  promises  to  be  a  very  effective  and  flexible  tool  for  robust  computer  architecture 
analysis,  however  further  work  is  necessary  to  fully  realize  its  potential. 

In  the  final  analysis,  the  consideration  of  cache  miss  rates  must  be  weighed  with  the  impact 
of  those  miss  rates  on  overall  memory  system  performance.  The  actual  goal  of  a  cache  is  to  improve 
memory  access  times.  A  cache  with  a  very  low  miss  rate  but  with  a  slow  access  time  is  just  as  much 
a  problem  as  a  cache  with  a  high  miss  rate  but  very  fast  access  time.  Trafl&c  between  the  various 
levels  of  the  memory  hierarchy  will  also  play  a  factor,  as  the  time  to  service  a  miss  is  also  important. 
Other  factors  such  as  the  area  and  power  required  for  the  cache  must  also  be  considered  for  an 
accurate  appraisal  of  the  cost  and  benefits  of  incorporating  a  certain  cache  design  into  a  system. 
This  work  has  been  the  first  step  towards  such  appraisals  which  include  a  comprehensive  workload. 


93 


9  Contributions  of  this  Thesis 


•  The  majority  of  the  work  described  in  this  thesis  has  revolved  around  developing  the  ATOM 
tracing  capability  for  the  operating  system  and  multiple  user  programs.  Previous  work  in 
this  particular  area  is  almost  non-existent.  ATOM  itself  is  a  well  defined  tool,  but  this  type 
of  implementation  has  not  been  studied  before.  A  general  method  to  instrument  the  kernel 
is  outlined  by  Eustace  and  Chen  in  [20],  but  not  well  explored.  Their  material  was  used  as 
a  foundation,  but  expanded  upon  to  develop  the  next  generation  of  tools.  The  testing  and 
refinement  performed  over  the  past  year  have  made  advances  in  several  areas: 

—  The  cache  simulation  tools  developed  are  much  more  comprehensive  than  any  existing 
ATOM  programs,  providing  more  flexibility  and  detailed  results. 

-  The  techniques  proposed  by  Eustace  and  Chen  have  been  extended  to  include  not  only 
the  operating  system  but  multiple  user  programs. 

—  The  issue  of  re-entrant  analysis  functions  was  explored  for  the  first  time.  This  will  play 
a  critical  role  in  the  exploration  of  certain  applications  such  as  the  operating  system. 

“  Other  limitations  associated  with  using  ATOM  on  the  kernel  are  now  more  fully  under¬ 
stood.  Some  were  addressed  in  this  work,  while  others  will  require  further  study  to  be 
completely  resolved. 

•  The  cache  simulations  served  as  a  validation  of  the  tools  developed.  The  results  confirmed  the 
necessity  for  this  type  of  work,  revealing  the  significance  of  multiprogramming  in  workloads. 
The  data  gathered  has  affirmed  theories  about  cache  performance,  and  can  be  used  to  design 
more  efficient  memory  caches. 

•  The  context  switch  model  attempts  to  combine  both  theoretical  and  empirical  cache  studies  in 
an  effort  to  achieve  a  balance  between  simplicity  and  accuracy.  It  is  an  extension  of  the  basic 
cache  model  which  synthetically  generates  the  impact  of  multiprogramming.  While  not  entirely 
successful,  the  testing  does  highlight  gaps  in  current  understanding  of  cache  performance  in 
a  complex  environment.  This  will  serve  cis  a  background  for  more  appropriate  models,  which 
should  successfully  reduce  simulation  processing. 

•  The  most  significant  aspects  of  this  thesis  are  the  potential  contributions  to  future  work.  With 
the  capability  developed  here,  a  wide  variety  of  additional  cache  studies  are  possible.  With 


94 


some  relatively  minor  modification,  the  tools  developed  can  be  adapted  to  a  wide  variety  of 
program  analyses.  Most  importantly,  this  work  will  provide  the  foundation  to  allow  these 
studies  to  include  the  operating  system,  a  subject  that  has  not  be  well  addressed  in  the  past. 


95 


10  Acknowledgments 


I  would  like  to  first  thank  my  advisor,  David  Kaeli,  for  his  guidance  and  motivation  in 
completing  this  project.  I  would  also  like  to  thank  Bradley  Chen,  Alan  Eustace,  and  Greg  Lueck 
for  their  considerable  assistance  in  working  with  ATOM,  and  Liz  Stewart  for  administration  of  the 
testbed  system.  Finally,  I  would  like  to  thank  Kristi  Forbes  for  her  invaluable  moral  support. 

This  thesis  was  funded  by  The  Charles  Stark  Draper  Laboratory  through  a  Draper  Fel¬ 
lowship  under  IR&D  number  713,  Fault  Tolerant  Computing.  Publication  of  this  thesis  does  not 
constitute  approval  by  Draper  of  the  findings  or  conclusions  contained  herein.  It  is  published  for  the 
exchange  and  stimulation  of  ideas.  The  author  assigns  his  copyright  of  this  thesis  to  The  Charles 
Stark  Draper  Laboratory,  Inc.,  Cambridge  Massachusetts.  Permission  is  hereby  granted  by  The 
Charles  Stark  Draper  Laboratory,  Inc.,  to  Northeastern  University  to  reproduce  any  or  all  of  this 
thesis. 


96 


11  Bibliography 
References 

[1]  A.  Agarwal,  Analysis  of  Cache  Performance  for  Operating  Systems  and  Multiprogramming , 
Kluwer,1989. 

[2]  A.  Agarwal,  J.  Hennessey,  and  M.  Horowitz,  "Cache  performance  of  Operating  System  and 
Multiprogramming  Workloads”,  ACM  Transactions  on  Computer  Systems^  VoL  6  No.  4,  Nov 
88,  pp.  393-431. 

[3]  A.  Agarwal,  M.  Horowitz,  and  J.  Hennessey,  "An  Analytical  Cache  Model”,  ACM  Transactions 
on  Computer  Systems,  Vol.  7  No.  2,  May  89,  pp.  184-215. 

[4]  E.  Appleton,  “DEC  OSF/1:  A  Taste  for  Business”,  The  DEC  Professional,  Vol.  13  No.  1,  Jan 
94,  pp.  40-44. 

[5]  P.  Argade,  D.  Charles,  and  C.  Taylor,  "A  Technique  for  Monitoring  Run  Time  Dynamics  of 
an  Operating  System  and  a  Microprocessor  Executing  User  Applications”,  ACM  SIGPLAN 
Notices,  Vol.  29  No.  11,  Nov  94,  pp.  122-131. 

[6]  D.  Bernstein,  S.  Gal,  and  M.  Rodeh,  “Mathematical  Analysis  of  Statistical  Sampling  for  Esti¬ 
mating  Computer  Cache  Performance”,  Communications  In  Statistics,  Vol.  12  No.  1,  1996,  pp. 
67-75. 

[7]  B.  Bershad  and  B.  Chen,  "Avoiding  Conflict  Misses  Dynamically  in  Large  Direct  Mapped 
Caches”,  ACM  SIGPLAN  Notices,  Vol.  29  No.  11,  Nov  94,  pp.  158-170. 

[8]  A.  Borg,  R.  Kessler,  and  D.  Wall,  "Generation  and  Analysis  of  Very  Long  Address  Traces”, 
Computer  Architecture  News,  Vol.  18  No.  2,  Jun  90,  pp.  270-279. 

[9]  P.  Bourne,  "UNIX:  More  on  DEC  OSF/1  Migration”,  The  DEC  Professional,  Vol.  13  No.  1, 
Jan  94,  pp.  49-50. 

[10]  B.  Chen,  Assembly  code  provided  in  personal  correspondence  via  email,  Apr  12,  1996. 

[11]  B.  Chen,  The  Impact  of  Software  Structure  and  Policy  on  CPU  and  Memory  System  Perfor¬ 
mance,  PhD  Thesis  Carnegie  Mellon  #  CMU-CS-94-145,  1994. 

[12]  B.  Chen  and  B.  Bershad,  "The  Impact  of  Operating  System  Structure  on  Memory  System 
Performance”,  Operating  Systems  Review,  Vol.  27  No.  5,  Dec  93,  pp.  120-133. 

[13]  B.  Chen,  D.  Wall,  and  A.  Borg,  “Software  Methods  for  System  Address  Tracing:  Implementa¬ 
tion  and  Validation”,  DEC  WRL  Research  Report  94/6,  1994. 

[14]  T.  Chen  and  J.  Baer,  “A  Performance  Study  of  Software  and  Hardware  Data  Prefetching 
Schemes”,  Computer  Architecture  News,  Vol.  22  No.  2,  Jun  94,  pp.  223-232. 

[15]  F.  Dahlgren,  M.  Dubois,  and  P.  Stenstrom,  “Combined  Performance  Gains  of  Simple  Cache 
Extensions”,  Computer  Architecture  News,  Vol.  22  No.  2,  Jun  94,  pp.  187-197. 

[16]  J.  Denham,  P.  Long,  and  J.  Woodward,  “DEC  OSF/1  Version  3.0  Symmetric  Multiprocessing 
Implementation”,  Digital  Technical  Journal,  Vol.  6  No.  3,  Sum  94,  pp.  29-43. 

[17]  T.  Dutton,  D.  Eiref,  H.  Kurth,  J.  Reisert,  and  R.  Stewart,  “The  Design  of  the  DEC  3000  AXP 
Systems,  Two  High  Performance  Workstations”,  Digital  Technical  Journal,  Vol.  4  No.  4,  92 
spec,  pp.  67-81. 


97 


[18]  J.  Dwyer  and  J.  Richman,  “OSF/1”,  UNIX  Review^  Vol.  10  No.  4,  Apr  92,  pp.  29-47. 

[19]  H.  El-Rewini,  H.  Ali  and  T.  Lewis,  “Task  Scheduling  in  Multiprocessing  Systems”,  Computer^ 
Vol.  28  No.  12,  Dec  95,  pp.  27-37. 

[20]  A.  Eustace  and  B.  Chen,  “ATOM  Kernel  Instrumentation  Guide  Version  0.4  ”,  unpublished, 
Sep  1995. 

[21]  M.  Evers,  P.  Chang,  and  Y.  Patt,  “Using  Hybrid  Predictors  to  Improve  Branch  Prediction 
Accuracy  in  the  Presence  of  Context  Switches”,  Computer  Architecture  News,  Vol.  24  No.  2, 
Jun  96,  pp.  3-11. 

[22]  J.  Feldman  and  C.  Retter,  Computer  Architecture:  A  Designers  Text  Based  on  a  Generic  RISC, 
McGraw  Hill,  1994. 

[23]  J.  Fraser,  “Simple  Modeling  of  Multiprocess  Effects  in  Cache  Simulations”,  unpublished,  1995. 

[24]  J.  Fraser  and  D.  Kaeli,  “Operating  System  Impact  on  Cache  Performance”,  unpublished,  1996. 

[25]  J.  Gee,  M.  Hill,  D.  Pnevmatikatos,  and  A.  Smith,  “Cache  Performance  of  the  SPEC92  Bench¬ 
mark  Suite”,  IEEE  Micro,  Vol.  13  No.  4,  Aug  93,  pp.  17-27. 

[26]  M.  Holliday  and  C.  Ellis,  “Accuracy  of  memory  Reference  Traces  of  Parallel  Computations  in 
Trace  Driven  Simulation”,  IEEE  Transactions  on  Parallel  and  Distributed  Systems,  Vol.  3  No. 

1,  Jan  92,  pp.  97-109. 

[27]  G.  Intrater  and  I.  Spillinger,  “Performance  Evaluation  of  a  Decoded  Instruction  Cache  for 
Variable  Instruction  Length  Computer”,  IEEE  Transactions  on  Computers,  Vol.  43  No.  10, 
Oct  94,  pp.  1140-1150. 

[28]  Q.  Jin  and  Y  Sugasawa,  “Representation  and  Analysis  of  Behavior  for  Multiprocess  Systems 
by  Using  Stochastic  Petri  Nets”,  Mathematical  and  Computer  Modeling,  Vol.  22  No.  10-12, 
Nov-Dec  95,  pp.  109-118. 

[29]  N.  Jouppi,  “Cache  Write  Policies  and  Performance”,  Computer  Architecture  News,  Vol.  21  No. 

2,  Jun  93,  pp.  191-201. 

[30]  K.  Kavi,  A  Hurson,  P.  Patadia,  E.  Abraham,  and  P.  Shanmugam,  “Design  of  Cache  Memories 
for  Multithreaded  Dataflow  Architecture”,  Computer  Architecture  News,  Vol.  23  No.  2,  May 
95,  pp.  253-264. 

[31]  M.  Kobayashi,  “A  Cache  Multitasking  Model”,  Performance  Evaluation  Review,  Vol.  20  No.  2, 
Nov  92,  pp.  27-37. 

[32]  J.  Kuntz,  “Performance  Evaluation  of  Cache  Architectures  in  Tightly  Coupled  Multiprocessor 
Systems”,  Future  Generations  Computer  Systems,  Vol.  10  No.  1,  Oct  94,  pp.  15-27. 

[33]  S.  Laha,  J.  Patel,  and  R.  Iyer,  “Accurate  Low-Cost  Methods  for  Performance  Evaluation  of 
Cache  memory  Systems”,  IEEE  Transactions  on  Computers,  Vol.  37  No.  11,  Nov  88,  pp.  1325- 
1335. 

[34]  A.  Lebeck  and  D.  Wood,  “Cache  Profiling  and  the  SPEC  Benchmarks:  A  Case  Study”,  Com¬ 
puter,  Vol.  27  No.  10,  Oct  94,  pp.  15-26. 

[35]  S.  Mahmud,  “Comments  on  ’Synthetic  Traces  for  Trace  Driven  Simulation  of  Cache  Memories”’, 
IEEE  Transactions  on  Computers,  Vol.  43  No.  1,  Jan  94,  pp.  125-126. 

[36]  M.  Markowitz,  “Cache  Design”,  EDN,  Vol.  36  No.  9,  Apr  91,  pp.  136-148. 


98 


[37]  A.  Maynard,  C.  Donnelly,  and  B.  Olszewski,  “Contrasting  Characteristics  and  Cache  Perfor¬ 
mance  of  Technical  and  Multi-User  Commercial  Workloads”,  ACM  SIGPLAN  Notices,  Vol.  29 
No.  11,  Nov  94,  pp.  145-156. 

[38]  D.  McCrackin  and  S.  Srinivasan,  “Trace  Driven  Pipeline  and  Cache  Simulation  of  Multithreaded 
Computers”,  Simulation,  Vol.  63  No.  2,  Aug  94,  pp.  75-82. 

[39]  E.  McLellan,  “The  Alpha  AXP  Architecture  and  21064  Processor”,  IEEE  Micro,  Vol.  13  No. 
3,  Jun  93,  pp.  36-47. 

[40]  E.  McRae,  “Benchmarking  Real  Time  Operating  Systems”,  Dr  Dobbs  Journal,  Vol.  21  No.  5, 
May  96,  pp.  48-58. 

[41]  J.  Mogul  and  A.  Borg,  “The  Effect  of  Context  Switches  on  Cache  Performance”,  ACM  SIG¬ 
PLAN  Notices,  Vol.  26  No.  4,  Apr  91,  pp.  75-84. 

[42]  D.  Nicol  and  E.  Carr,  “Empirical  Study  of  Parallel  Trace  Driven  LRU  Cache  Simulators”, 
Simulation  Digest,  Vol.  25  No.  1,  Jul  95,  pp.  166-169. 

[43]  D.  Nicol,  A.  Greenberg,  and  B.  Lubachevsky,  “Massively  Parallel  Algorithms  for  Trace  Driven 
Cache  Simulations”,  IEEE  Transactions  on  Parallel  and  Distributed  Systems,  Vol.  5  No.  8,  Aug 
94,  pp.  849-858. 

[44]  S.  Oualline,  Practical  C  Programming,  O’Reilly  and  Associates,  1991. 

[45]  D.  Pnevmatikatos  and  M.  Hill,  “Cache  Performance  of  the  Integer  SPEC  Benchmarks  on  a 
RISC”,  Computer  Architecture  News,  Vol.  18  No.  2,  Jun  1990,  pp.  53-68. 

[46]  C.  Prete,  G.  Prina,  and  L.  Ricciardi,  “A  Trace  Driven  Simulator  for  Performance  Evaluation  of 
Cache  Ba.sed  Multiprocessor  Systems”,  IEEE  Transactions  on  Parallel  and  Distributed  Systems, 
Vol.  6  No.  9,  Sep  95,  pp.  915-929. 

[47]  S.  Przybylski,  M.  Horowitz,  and  J.  Hennessey,  “Performance  Tradeoffs  in  Cache  Design”,  Com¬ 
puter  Architecture  News,  Vol.  16  No.  3,  Jun  88,  pp.  290-298. 

[48]  R.  Quong,  “Expected  I  Cache  Miss  Rates  via  the  Gap  Model”,  Computer  Architecture  News, 
Vol.  22  No.  2,  Apr  94,  pp.  372-383. 

[49]  R.  Saavedra  and  A.  Smith,  “Measuring  Cache  and  TLB  Performance  and  Their  Effect  on 
Benchmark  Runtimes”,  IEEE  Transactions  on  Computers,  Vol.  22  No.  10,  Oct  95,  pp,  1223- 
1235. 

[50]  D.  Spinellis,  “Trace:  A  Tools  for  Logging  Operating  System  Call  Transactions”,  Operating 
System  Review,  Vol.  28  no  4,  Oct  94,  pp.  56-62. 

[51]  A.  Srivastava  and  A.  Eustace,  “ATOM:  A  System  for  Building  Customized  Program  Analysis 
Tools”,  ACM  SIGPLAN  Notices,  Vol.  29  No.  6,  Jun  94,  pp.  196-205. 

[52]  W.  Stallings,  Computer  Organization  and  Architecture:  Principles  of  Structure  and  Function, 
Macmillan,  1990. 

[53]  H.  Stark  and  J.  Woods,  Probability,  Random  Processes,  and  Estimation  Theory  for  Engineers, 
Prentice  Hall,  1994 

[54]  C.  Stunkel  and  K.  Fuchs,  “TRAPEDS:  Producing  Traces  for  Multicomputers  Via  Execution 
Driven  Simulation”,  Performance  Evaluation  Review,  Vol.  17  No.  1,  May  89,  pp.  70-78. 


99 


[55]  O.  Temam,  C.  Flicker,  and  W.  Jalby,  “Cache  Interference  Phenomena”,  Performance  Evaluation 
Review,  VoL  22  No.  1,  May  94,  pp.  261-271. 

[56]  D.  Thiebaut  and  H.  Stone,  “Footprints  in  the  Cache”,  ACM  Transactions  on  Computer  Systems, 
Vol.  5  No.  4,  Nov  87,  pp.  305-329. 

[57]  D.  Thiebaut,  J.  Wolf,  and  H.  Stone,  “Synthetic  Traces  for  Trace  Driven  Simulation  of  Cache 
Memories”,  IEEE  Transactions  on  Computers,  Vol.  41  No.  4,  Apr  92,  pp.  388-410. 

[58]  D.  Thiebaut,  J.  Wolf,  and  H.  Stone,  “Corrigendum  to  ’Synthetic  Traces  for  Trace  Driven  Sim¬ 
ulation  of  Cache  Memories”’,  IEEE  Transactions  on  Computers,  Vol.  42  No.  5,  May  93,  pp. 
635-636. 

[59]  J.  Torrellas,  A.  Gupta,  and  J.  Hennessy,  “Characterizing  the  Caching  and  Synchronization 
Performance  of  a  Multiprocessor  Operating  System”,  ACM  SIGPLAN  Notices,  Vol.  27  No.  9, 
Sep  92,  pp.  162-174. 

[60]  R.  Uhlig  and  T.  Mudge,  “Trace  Driven  Memory  Simulation:  A  Survey” ,  unpublished,  1996. 

[61]  W.  Wang  and  J.  Baer,  “Efficient  Trace  Driven  Simulation  Methods  for  Cache  Performance 
Analysis”,  ACM  Transactions  on  Computer  Systems,  Vol.  9  No.  3,  Aug  91,  pp.  27-36. 

[62]  D.  Whalley,  “Fast  Instruction  Cache  Performance  Evaluation  Using  Compile  Time  Analysis”, 
Performance  Evaluation  Review,  Vol.  20  No.  1,  Jun  92,  pp.  13-22. 

[63]  Y.  Wong  and  S  Hwang,  “Prediction  of  Memory  Consumption  in  Conservative  Parallel  Simula¬ 
tion”,  Simulation  Digest,  Vol.  25  No.  1,  Jul  95,  pp.  199-202. 

[64]  E.  Wu,  Y.  Hsu,  and  Y.  Liu,  “Efficient  Stack  Simulation  for  Set  Associative  Virtual  Address 
Caches  With  Real  Tags”,  IEEE  Transactions  on  Computers,  Vol.  44  No.  5,  May  95,  pp.  719-723. 

[65]  Alpha  AXP  Architecture  Handbook,  Digital  Equipment  Corporation,  1994. 

[66]  ATOM  Reference  Manual,  Digital  Equipment  Corporation,  1993. 

[67]  ATOM  User  Manual,  Digital  Equipment  Corporation,  1994. 

[68]  ATOM  User  Manual,  Digital  Equipment  Corporation,  1995. 

[69]  DEC  3000  Model  300  Series  AXP  Hardware  Reference  Guide,  Digital  Equipment  Corporation, 
1994. 

[70]  DEC  OSF/1  Installation  Guide,  Digital  Equipment  Corporation,  1994. 

[71]  DEC  OSF/1  Guide  To  Real-time  Programming,  Digital  Equipment  Corporation,  1994. 

[72]  DEC  OSF/1  Technical  Overview,  Digital  Equipment  Corporation,  1994. 

[73]  Program  Analysis  Using  Atom  Tools,  Digital  Equipment  Corporation,  1996. 

[74]  On  line  documentation  (SPEC92,  ATOM,  Dinero). 


100 


A  Program  Source  Code 

Programs  are  based  primarily  on  the  structure  developed  in  [20]  and  past  work  from  [23,  24]. 
Other  sources  for  information  include  [44,  66,  67,  68,  73,  74].  The  input  and  output  file  formats  are 
shown  first  with  short  examples,  followed  by  the  various  files  and  programs  used.  They  are  provided 
as  a  reference  for  future  efforts  as  well  cis  to  help  understanding  of  the  material: 

1.  Input  Format  and  Example 

2.  Output  Format  and  Example 

3.  Cache  Model  Library 

4.  Kernel  Instrumentation  File 

5.  Kernel  Analysis  File 

6.  Program  Instrumentation  File 

7.  Program  Analysis  File 

8.  Sample  Tool  Description  File 

9.  Context  Switch  Model  Library 

10.  Model  Analysis  File 


101 


A-1  Input  Format 

The  input  file  must  be  called  cache .  in  and  has  the  format: 

•  (simulation  name) 

•  (number  of  processes  in  simulation) 

•  (name  of  each  process  (n-1  names,  process  0  is  assumed  to  be  the  OS) 

•  (number  of  caches  in  simulation) 

•  (cache  definitions) 


Names  can  contain  up  to  80  characters.  Cache  definitions  consist  of  two  lines.  The  first  is  a  0  or  1 
denoting  the  cache  type.  The  second  contains  the  cache  parameters  in  the  forms  shown  below  based 
on  cache  type: 

Unified(O)  (U  cache  size)  (U  block  size)  (U  associativity) 

Split(l)  (I  cache  size)  (I  block  size)  (I  associativity)  (D  cache  size)  (D  block  size)  (D  associativity) 

An  short  example  input  file  is  shown  below: 

multi  process  test 
3 

ccl  “0  -quiet  stmt.i  -o  stmt 
espresso  tial.in  >  /dev/null 
3 
0 

16384  64  2 
1 

16384  128  4  16384  128  4 
1 

32768  256  1  32768  256  1 


102 


A. 2  Output  Format 

The  simulation  results  were  dumped  to  a  file  called  cache. out.  The  output  format  has  a 
banner  page  followed  by  a  page  of  results  for  each  cache.  Results  are  recorded  at  the  end  of  each 
program  in  the  simulation,  however  the  second  set  of  data  was  removed  from  the  example  for  brevity. 
The  format  is  self  evident  from  the  example  shown  below.  In  hindsight,  the  output  file  should  have 
used  a  format  directly  readable  by  a  spreadsheet  program.  The  format  below  is  easy  to  understand, 
however  it  also  requires  manual  entry  of  data  into  spreadsheets  for  analysis. 


<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 
SIMULATION:  multi  process  test 

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 
Number  Tasks  =  3 
#0:  kernel 

#1:  ccl  -0  -quiet  stmt.i  -o  stmt 
#2:  espresso  tial.in  >  /dev/null 
Number  Caches  =  3 

(type,  icsize,  ilsize,  iassoc,  dcsize,  dlsize,  dassoc) 


#0: 

0 

16384 

64 

2 

#1: 

1 

16384 

128 

4 

16384 

128 

4 

#2: 

1 

32768 

256 

1 

32768 

256 

1 

DATA  AT  END  OF  PROCESS  1 

<><><><><><><><><><><><><><><><><><><><><><><><><> 
simulation:  multi  process  test 

(data  at  end  of  process  1) 


CACHE  #  0 

cache  type:  0  (0=unified,  l=split) 
icache  size:  16384 
icache  line  size:  64 
icache  associativity:  2 

3|c:|c:)c3)c:fc:)e:|c:|c:»::4c 


Process  #0 


Inst  39004710  Miss 

Data  16350661  Miss 

read  10758087  Miss 

writ  5592574  Miss 

TOTAL  55355371  Miss 


2739339  Perc  7.023098 
3071643  Perc  18.786048 
2366717  Perc  21.999422 
704926  Perc  12.604679 
5810982  Perc  10.497594 


Interferance  (number  times  process  0  overwrote:) 
Process  0  =  2614797 

Process  1  =  2207422 


103 


Process  2  =  988510 

Process  3  =  253 

(process  3  is  invalid  data) 

Process  #1 

Inst  160240175  Miss  5166542  Perc  3.224249 

Data  69272178  Miss  4512864  Perc  6.514685 

read  50197333  Miss  3475694  Perc  6.924061 

writ  19074845  Miss  1037170  Perc  5.437371 

TOTAL  229512353  Miss  9679406  Perc  4.217379 

Interferance  (mimber  times  process  1  overwrote:) 
Process  0  =  2175838 

Process  1  =  4910549 

Process  2  =  2287801 

Process  3  =  3 

(process  3  is  invalid  data) 

Process  #2 

Inst  224015943  Miss  1813316  Perc  0.809458 

Data  63229661  Miss  3257726  Perc  5.152212 

read  51131731  Miss  2778587  Perc  5.434174 

writ  12097930  Miss  479139  Perc  3.960504 

TOTAL  287245604  Miss  5071042  Perc  1.765403 

Interferance  (mimber  times  process  2  overwrote:) 
Process  0  =  1020129 

Process  1  =  2561443 

Process  2  =  1489470 

Process  3  =  0 

(process  3  is  invalid  data) 

TOTAL  FOR  CACHE 

Inst  423260828  Miss  9719197  Perc  2.296267 

Data  148852500  Miss  10842233  Perc  7.283877 

read  112087151  Miss  8620998  Perc  7.691335 

writ  36765349  Miss  2221235  Perc  6.041654 

TOTAL  572113328  Miss  20561430  Perc  3.593944 

simulation:  multi  process  test 

(data  at  end  of  process  1) 


CACHE  #  1 

cache  type:  1  (0=unified,  l=split) 

icache  size:  16384 

icache  line  size:  128 

icache  associativity:  4 

dcache  size:  16384 

dcache  line  size:  128 

dcache  associativity:  4 

Process  #0 

Inst  39028217  Miss  1297351  Perc  3.324136 

Data  16360315  Miss  2091714  Perc  12.785292 


104 


read  10764480  Miss  1706268  Perc  15.850910 

writ  5595835  Miss  385446  Perc  6.888087 

TOTAL  55388532  Miss  3389065  Perc  6.118712 

Interf eraiLce  (number  times  process  0  overwrote:) 
Process  0  =  1358317 

Process  1  =  1358773 

Process  2  =  671722 

Process  3  =  253 

(process  3  is  invalid  data) 

:|e:te*3|c:fc3te:)c4e3te3|c 

Process  #1 

Inst  160240175  Miss  2378836  Perc  1.484544 

Data  69272178  Miss  2370733  Perc  3.422345 

read  50197333  Miss  1965331  Perc  3.915210 

writ  19074845  Miss  405402  Perc  2.125323 

TOTAL  229512353  Miss  4749569  Perc  2.069418 

Interference  (number  times  process  1  overwrote:) 
Process  0  =  1356440 

Process  1  =  2358083 

Process  2  =  1945071 

Process  3  =  3 

(process  3  is  invalid  data) 

**:(e*:jc  +  +  *  +  * 

Process  #2 

Inst  224033574  Miss  652803  Perc  0.291386 

Data  63235212  Miss  1542671  Perc  2.439576 

read  51136035  Miss  1321124  Perc  2.583548 

writ  12099177  Miss  221547  Perc  1.831091 

TOTAL  287268786  Miss  2195474  Perc  0.764258 

Interference  (number  times  process  2  overwrote:) 
Process  0  =  674120 

Process  1  =  993262 

Process  2  =  488640 

Process  3  =  0 

(process  3  is  invalid  data) 

:*c  **  +  :»£**♦  :tc  ♦**  * 

TOTAL  FOR  CACHE 

Inst  423301966  Miss  4328990  Perc  1.022672 

Data  148867705  Miss  6005118  Perc  4.033862 

read  112097848  Miss  4992723  Perc  4.453897 

writ  36769857  Miss  1012395  Perc  2.753329 

TOTAL  572169671  Miss  10334108  Perc  1.806126 

simulation:  multi  process  test 

(data  at  end  of  process  1) 


CACHE  #  2 

cache  t3rpe:  1  (0=unified,  l=split) 
icache  size:  32768 
icache  line  size:  256 
icache  associativity:  1 
dcache  size:  32768 


105 


dcache  line  size:  256 
dcache  associativity:  1 

Process  #0 

Inst  39100207  Miss  877237  Perc  2.243561 

Data  16384285  Miss  2191363  Perc  13.374786 

read  10780440  Miss  1793502  Perc  16.636631 

writ  5603845  Miss  397861  Perc  7.099786 

TOTAL  55484492  Miss  3068600  Perc  5.530554 

Interferance  (number  times  process  0  overwrote:) 
Process  0  =  1704283 

Process  1  =  946851 

Process  2  =  417213 

Process  3  =  253 

(process  3  is  invalid  data) 

Process  #1 

Inst  160240175  Miss  1414353  Perc  0.882646 

Data  69272178  Miss  2717362  Perc  3.922732 

read  50197333  Miss  2261685  Perc  4.505588 

writ  19074845  Miss  455677  Perc  2.388890 

TOTAL  229512353  Miss  4131715  Perc  1.800215 

Interferance  (number  times  process  1  overwrote:) 
Process  0  =  942089 

Process  1  =  2260273 

Process  2  =  929350 

Process  3  =  3 

(process  3  is  invalid  data) 

Process  #2 

Inst  224033574  Miss  435774  Perc  0.194513 

Data  63235212  Miss  2459827  Perc  3.889964 

read  51136035  Miss  2205351  Perc  4.312714 

writ  12099177  Miss  254476  Perc  2.103250 

TOTAL  287268786  Miss  2895601  Perc  1.007976 

Interferance  (number  times  process  2  overwrote:) 
Process  0  =  422012 

Process  1  =  924590 

Process  2  =  1548999 

Process  3  =  0 

(process  3  is  invalid  data) 

3fc  *  sic  *  *  *  4c  *  *  *  *  *  :4c  :ile  ic  :(e  :fc 

TOTAL  FOR  CACHE 

Inst  423373956  Miss  2727364  Perc  0.644197 

Data  148891675  Miss  7368552  Perc  4.948935 

read  112113808  Miss  6260538  Perc  5.584092 

writ  36777867  Miss  1108014  Perc  3.012720 

TOTAL  572265631  Miss  10095916  Perc  1.764201 

DATA  AT  END  OF  PROCESS  2 

<><><><><><><><><><><><><><><><><><><><><><><><><> 

(format  repeats  for  data  at  end  of  second  process) 


106 


A. 3  Cache  Model  Library 

The  following  file,  cache. h,  Wcis  used  as  a  definition/procedure  library  for  the  basic  cache 
simulator: 

/*  CACHE. H  */ 

/*  CACHE  SIMULATION  LIBRARY  */ 

/*  JOHN  FRASER  */ 

/*  SIMULATION  CHARACTERISTICS  */ 

/*  MAXIMUM  NUMBER  OF  CACHES  IN  SIMULATION  */ 

#define  MAXCACHES  40 

/*  MAXIMUM  NUMBER  OF  PROCESSES  IN  SIMULATION  */ 

#define  MAXTASKS  4 

/*  MAXIMUM  NUMBER  OF  LINES  (CSIZE/(BSIZE*ASSOC))  IN  CACHES  */ 

#define  MAXLINE  512 

/*  MAXIMUM  ASSOCIATIVITY  OF  CACHES  */ 

#define  MAXASSOC  4 

/*  CACHE  PARAMETERS  */ 
typedef  struct 

/*  CACHE  TYPE  (0=UNIFIED,  1=SPLIT)  */ 
int  tjrpe; 

/*  CACHE  SIZE  FOR  EACH  SECTION  (0=UNIFIED/INST.  1=DATA)  */ 
int  c size [2]; 

/=•=  BLOCK  SIZE  FOR  EACH  SECTION  */ 
int  bsizeC2]; 

/*  ASSOCIATIVITY  FOR  EACH  SECTION  */ 
int  assoc [2]; 

/*  BIT  SHIFT  USED  TO  ISOLATE  TAG  FROM  ADDRESS  */ 
int  t shift [2]; 

/=*=  BIT  SHIFT  USED  TO  ISOLATE  LINE  FROM  ADDRESS  */ 
int  Ishif t  [2] ; 

/*  BIT  MASK  USED  TO  ISOLATE  LINE  FROM  ADDRESS  */ 
int  Imask [2] ; 

}  paxcim; 

/*  CACHE  BLOCK  STORAGE  */ 
tjrpedef  struct 
{ 

/*  BLOCK  TAG  */ 
long  tag; 

/*  BLOCK  ’USE  BITS’  FOR  ASSOCIATIVE  CACHES  */ 
unsigned  long  use; 

/*  BLOCK  OWNER  PROCESS  */ 
int  task; 

]■  block; 

/*  CACHE  PERFORMANCE  STATISTICS  */ 
typedef  struct 
{ 


107 


t*  NUMBER  OF  INSTRUCTION  FETCHES  */ 
unsigned  long  instcnt; 

I*  NUMBER  OF  DATA  LOADS  */ 
unsigned  long  readcnt; 

/*  NUMBER  OF  DATA  STORES  */ 
unsigned  long  writcnt; 

/*  NUMBER  OF  OVERWRITES  OVER  EACH  PROCESS  */ 
/*  NUMTASKS+1  =  INVALID  DATA  */ 
unsigned  long  interfere [MAXTASKS+1] ; 

/*  NUMBER  OF  INSTRUCTION  FETCH  MISSES  */ 
unsigned  long  instmisscnt; 

/*  NUMBER  OF  DATA  LOAD  MISSES  */ 
unsigned  long  readmisscnt; 

/*  NUMBER  OF  DATA  STORE  MISSES  *l 
unsigned  long  writmisscnt; 

}  stats; 

/*  STRING  DEFINITION  */ 
tjrpedef  char  string  [80]  ; 

/*  SHARED  ATOM  DATA  */ 
typedef  struct 

■C 

/*  NUMBER  OF  CACHES  IN  USE  */ 
int  numc aches; 

/*  NUMBER  OF  CACHES  IN  SIMULAITON  */ 
int  actcaches; 

/*  NUMBER  OF  PROCESSES  IN  SIMULATION  */ 
int  numtasks; 

/*  NUMBER  OF  PROCESSES  CURRENTLY  EXECUTING  */ 
int  count; 

/*  PID  OF  CURRENT  PROCESS  */ 
int  curt ask; 

/♦  PROCESS  NAMES  */ 
string  name [MAXTASKS] ; 

/*  CACHE  PARAMTERS  */ 
param  para[MAXCACHES] ; 

/*  CACHE  STATE  (BLOCK  INFORMATION)  */ 
block  dataCMAXCACHES] [2] [MAXLINE] [MAXASSOC] ; 
/*  PERFORMANCE  STATISTICS  */ 
stats  stat[MAXCACHES] [MAXTASKS]; 

}  datablock; 

/*  INTEGER  L0G2  FUNCTION  */ 
int  inylog2(int  num) 

{ 

if  (num  <  2) 
retum(O) ; 
else 

retumd  +  mylog2(num/2)) ; 

} 


108 


A. 4  Kernel  Instrumentation  File 

The  kernel  instrumentation  file  kern,  inst .  c  is  responsible  for  adding  the  calls  to  the  analy¬ 
sis  routines  at  the  appropriate  points.  A  call  to  the  initialization  function  is  mcide  when  the  program 
is  initiedly  loaded,  and  thereafter  at  each  data  reference  and  sets  of  instructions,  calls  are  made  to  the 
various  analysis  routines.  A  call  is  inserted  at  the  start  of  each  hardclock  interrupt  service  routine 
for  scaling  purposes.  Note  the  test  to  check  for  the  kernel  procedures  which  cannot  be  instrumented. 

/♦  KERN.IHST.C  */ 

/*  KERNEL  INSTRUMENTATION  FILE  */ 

/*  JOHN  FRASER  */ 

#include  <string.h> 

#include  <cmplrs/atom . inst .h> 

I*  DEFINE  PROCESS  ID  */ 

#define  PROCNUM  0 

/*  TEST  FOR  ROUTINES  WHICH  CANNOT  BE  TRACED  */ 
int  CanInstrument(Proc  *p) 

{ 

const  char*  name  =  ProcFileName(p) ; 

retum(strcmp(". ./src/kemel/arch/alpha/locore.  s" .name)  !=0  ft& 
strcmpC". ./. ./. ./. ./src/kemel/arch/alpha/lockprim.s",name) !=0 
strcmpC ./src/kemel/arch/alpha/spl.s",name)  !=0)  ; 

> 


/*  INSTRUMENT:  */ 
/*  ALL  DATA  REFERENCES  AND  */ 
/*  SETS  OF  8  INSTRUCTIONS  OR  LESS  */ 
/*  (WITHIN  SAME  BASIC  BLOCK)  */ 
/*  ANALYSIS  ROUTINES:  */ 
/*  INSTRUCTION  FETCH(ADDRESS,PID, NUMBER)*/ 
/*  DATA  LOAD (ADDRESS, PID)  */ 
/*  DATA  STORE (ADDRESS. PID)  */ 


unsigned  InstrumentAlKint  argc,  char**  argv) 

{ 

Obj*  o; 

Proc*  p; 

Block*  b; 

Inst*  i; 

/*  ADD  PROCEDURE  PROTOTYPES  */ 

AddCallProto(**initcache()") ; 

AddCallProtoC'instref (REGV,  int,  int)"); 
AddCallProtoC'readref (VALUE,  int) ") ; 

AddCallProtoC'writref (VALUE,  int)") ; 
AddCallProto("skipcall(REGV,  REGV)") ; 

/*  ADD  INITIALIZATION  CALL  */ 

AddCallProgr am (PrograitiBef ore,  "init cache" )  ; 

/*  ITERATE  THROUGH  ORIGINAL  CODE  ADDING  REFERENCE  CALLS  */ 
o  =  GetFirstObjO ; 
if  (BnildObj (o))  return  1; 


109 


p  =  GetNamedProcChardclock") ; 

/*  ADD  CALL  FOR  HARDCLOCK  SCALING  */ 

AddCallProcCp,  ProcBefore,  "skipcall**,  REG_SP,  REG_RA) ; 
for  (p=GetFirstObjProc(o) ;  p!=NULL;  p=GetlIextProc(p)) 

{ 

if  (  Caninst nunent  (p)  ) 

{ 

for  (b=GetFirstBlock(p) ;  bl=irULL;  b=GetNextBlock(b) ) 

long  pcEnd  =  InstPC(GetLastInst(b)) ; 
int  count  =  0; 

for  (i=GetFirstInst(b) ;  i!=NULL;  i=GetNextInst(i) ) 

/*  INSTRUCTION  FETCH  ♦/ 
if  ((count  &  7)  ==  0) 

{ 

int  instRem  =  ( (pcEnd-InstPC(i))/4)+l; 
int  instrLine  =  (instRem  >8)  ?  8  :  instRem; 

AddCallInst(i, Inst Before,  "instref”,  REG_PC,  PROCNUM,  instrLine); 

> 

count ++; 

/*  DATA  LOAD  */ 

if  (IsInstType(i,  InstTypeLoad)) 

AddCallInst(i,  InstBefore,  “readref*',  EffAddr Value ,  PROCNUM); 

/*  DATA  STORE  ♦/ 

if  (IsInstT3rpe(i,  InstTypeStore)) 

AddCalllnst (i,  InstBefore,  "writref",  EffAddr Value ,  PROCNUM); 

} 

> 

> 

} 

WriteObj (o) ; 
retum(O)  ; 

} 


110 


A. 5  Kernel  Analysis  File 

The  kernel  analysis  file  kern. anal. c  defines  the  analysis  routines  called  in  the  instrumen- 
tation  file,  and  any  other  utility  functions/procedures.  There  are  4  analysis  routines  to  consider: 

Initialization  The  initialization  routine  is  responsible  for  establishing  the  bcisic  simulation  param¬ 
eters  when  the  kernel  is  loaded.  The  simulator  is  essentially  put  into  a  paused  simulation  state 
(0  caches)  so  that  it  is  not  actively  capturing  and  processing  references  until  a  test  program  is 
started. 

Hardclock  Scaling  This  procedure  will  discard  a  certain  number  of  hardclock  interrupts  controlled 
by  a  scaling  factor. 

Instruction  Fetch  Routine  The  instruction  fetch  routine  is  responsible  for  servicing  instruction 
fetches  in  the  reference  stream.  It  processes  each  set  of  references  in  the  cache  based  on  the 
sets  starting  address,  the  number  of  instructions  in  the  set,  and  the  PID  of  the  sending  process. 
Using  a  PID  allows  the  same  code  to  be  used  for  each  process’s  analysis  routines  as  well  as 
maintaining  cache  coherency. 

Data  Load  Routine  The  data  load  routine  is  responsible  for  servicing  the  data  loads  in  the  refer¬ 
ence  stream.  It  is  almost  identical  to  the  previous  routine  except  for  the  necessity  of  determin¬ 
ing  which  cache  to  access  depending  on  a  split  or  unified  model,  and  the  fact  that  it  services 
only  a  single  reference  at  a  time. 

Data  Store  Routine  The  analysis  routine  for  data  stores,  it  is  almost  identical  to  the  data  load 
routine  except  for  incrementing  different  counters. 

The  similarities  between  each  routine  would  suggest  that  the  common  aspects  be  defined  in  a  separate 
function  which  is  called  by  each  analysis  routine,  but  this  increases  the  processing  latency  by  an 
unacceptable  degree.  The  data  used  by  these  routines  is  defined  in  the  library  file  and  is  implemented 
as  global  variables. 

/*  KERN. ANAL. C  ♦/ 

/*  KERNEL  ANALYSIS  FILE  */ 

/*  JOHN  FRASER  */ 

/♦  HARDCLOCK  SCALING  VALUE 
#defiiie  SCALE  3 

#iiiclude  “caclie.h” 

#include  <stdio.li> 

#include  <c_asm.h> 

/♦  SHARED  CACHE  DATA  */ 
datablock  satom; 

/*  HARDCLOCK  SCALING  DATA  */ 
int  clockscale  =  1; 
int  clockcount  =  0; 

/*  INITIALIZE  BASIC  PARAMETERS  ♦/ 

/♦  SIMULATION  (CAPTURE)  DISABLED  ♦/ 
void  initcacheO 

satom. numcaches  =  0; 


111 


s atom. act caches  =  0; 
satom.numtasks  =  0; 
s  atom,  curt  ask  =  0; 
satom. count  =  0; 
clockscale  =  SCALE; 
clockcount  =  0; 
return; 

} 

/♦  HARDCLOCK  SCALIITG  */ 

void  skipcall  (unsigned  long  sp,  unsigned  long  ra) 

{ 

clockcount ++; 

if  (clockcount  >=  clockscale) 

clockcount  =  0; 
return; 

} 

asmC'mov  y,a0,  ysp",sp); 
asm(”mov  Xal,  y,ra’',ra); 
asm('*ret  Xzero,  (y*ra)*‘); 
return; 

> 

/*  SCALING  EMERGENCY  */ 
void  KernelPanicO 

•C 

clockscale  =  1; 
return ; 

> 

/*  INSTRUCTION  REFERENCE  ROUTINE  ♦/ 

void  instrefdong  addr,  int  proc,  int  count) 

{ 

int  X,  leastx; 
unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/♦  PAUSE  CAPTURE  (RE-ENTRANCE)  ♦/ 
int  tempnumcaches  =  satom.numcaches; 
satom.  numc  aches  =  0; 

/*  PROCESS  REFERENCES  IN  EACH  CACHE  ♦/ 
for  (cnum=0;  cnum<tempnumcaches ;  cnum++) 

int  assoc  =  ( satom. pcira [cnum]  )  .assoc [0]  ; 

/*  UPDATE  STATISTICS  */ 

((satom. stat  [cnum] [proc] ) . instcnt)  +=  count; 

/♦  PARSE  ADDRESS  */ 

aline  =  (addr  &  (  satom.  par  a  [cnum]  ).  lmask[0]  )  » 
(satom.  para  [cnum]  )  .  1  shift  [0]  ; 
atag  =  addr  »  ( satom. para  [cnum]  )  .tshift[0]  ; 


112 


/♦  UPDATE  'USE  BITS'  AND  CHECK  FOR  HIT  ♦/ 
hit  =  0; 

lor  (x=0;  x<assoc;  x++) 

( (sat om. data Ccnum] [0] [aline] [x]).use)++; 
if  (( (satom.dat a [cmim] [0] [aline] [x]) .tag  ==  atag)  && 
((satom.data[cnnin]  [0]  [aline]  [x])  .task  ==  proc)) 

{ 

(satom.data[cnuin]  [0]  [aline]  [x]  )  .use  =  0; 
hit  =  1; 

} 

} 

/♦  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  ♦/ 
if  (hit  ==  0) 

{ 

/+  FIND  LRU  */ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

{ 

if  ( ((satom. data [cnum] [0] [aline] [x] ) .use  >=  leastused)  II 
((satom.data[cnuia]  [0]  [aline]  [x]  )  .task  == 

s  at  om .  numt  ask  s  )  ) 

•C 

leastused  =  (satom. data [cnum] [0] [aline] [x] ) .use; 
leastx  =  x; 

> 

if  ((satom. data [cnum] [0] [aline] [x] ) .task  == 

s at om . numt ask s ) 

X  =  assoc; 

> 

/♦  UPDATE  STATISTICS  */ 

( (satom. st at [cnum] [proc] ) .instmisscnt)++; 

((satom. St at [cnum] [proc] ) . interfere [ 

(satom. data [cnum] [0] [aline] [leastx] ) .task] )++; 

/*  UPDATE  CACHE  DATA  ♦/ 

( satom. data [cnum] [0] [aline] [leastx]) .tag  =  atag; 

(satom. data [cnum] [0]  [aline] [leastx]) .use  =  0; 

( satom. data[cnum] [0] [aline] [leastx]) .task  =  proc; 

> 

> 

/*  RESUME  CAPTURE  ♦/ 

satom.  numcaches  =  tempnumc  aches; 

return; 

} 

/*  DATA  LOAD  ROUTINE  ♦/ 

void  readref(long  addr,  int  proc) 

{ 

int  index ; 

int  X,  leastx; 

unsigned  long  leastused; 


113 


long  aline,  atag; 
int  cmim,  hit; 

/+  PAUSE  CAPTURE  (RE-ENTRANCE)  */ 
int  tempnumc aches  =  satom.nnmcaches; 
satom.mimcaches  =  0; 

/*  PROCESS  REFERENCE  IN  EACH  CACHE  ♦/ 
for  (cniim=0;  cnnin<teinpnnmcaches ;  cnum+t) 

int  t3rpe  =  (satom.paraCcnnm] )  . type; 

int  assoc  =  (satom.paraCcnnm]  )  .assoc [type]  ; 

/*  UPDATE  STATISTICS  ♦/ 

((satom. stat Ccnum] [proc] ) .readcnt)++; 

/♦  PARSE  ADDRESS  ♦/ 

aline  =  (addr  &  (satom.paxaCcnnm])  .lmask[t3rpe] )  » 
(satom.paraCcnnm]  )  .1  shift  Ctype]  ; 
atag  =  addr  »  (satom.paraCcnnm]  )  .tshift  Ctype]  ; 

/♦  UPDATE  ^USE  BITS^  AND  CHECK  FOR  HIT  */ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 

{ 

((satom.  data  Ccnum]  Ctype]  Caline]  Cx]  )  .nse)++; 
if  ( ((satom. dataCcnnm] Ctype] Caline] Cx] ) .tag  ==  atag)  && 
((satom. data Ccnum] Ctype] Caline] Cx] ) .task  ==  proc)) 

(satom.  data  Ccnum]  Ctype]  Caline]  Cx]  )  .use  =  0; 
hit  =  1; 

> 

} 

/♦  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  ♦/ 
if  (hit  ==  0) 

/♦  FIND  LRU  ♦/ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

•C 

if  ( ((satom. data Ccnum] Ctype] Caline] Cx]) .use  >=  leastused)  || 
((satom. data Ccnum] Ctype] Caline] Cx]) .task  == 

satom. numt  asks ) ) 

{ 

leastused  =  (satom.  data  Ccnum]  Ctype]  Caline]  Cx]).use; 
leastx  =  x; 

> 

if  ((satom. data Ccnum] Ct3pe] Caline] Cx] ) .task  == 

s  at  om .  numt  ask  s  ) 

X  =  assoc; 

> 

/*  UPDATE  STATISTICS  ♦/ 

((satom. stat Ccnum] Cproc]) .readmisscnt)++; 

( (satom. stat  Ccnum] Cproc] ) . interfere  C 
(satom.  data  Ccnum]  Ct3pe]  Caline]  Cleastx]  )  .task]  )++; 

/*  UPDATE  CACHE  DATA  */ 


114 


(satom.dataCcnum] [type] [aline] [leastx] ) .tag  =  atag; 
(satom.data[cmiin]  [t3rpe]  [aline]  [leastx])  .use  =  0; 

(satom. dat a [cnnm] [type] [aline] [leastx] ) .task  =  proc; 

} 

} 

/♦  RESUME  CAPTURE  */ 
satom.mimcaches  =  tempnnmc  aches; 
return; 

} 

/*  DATA  STORE  ROUTINE  ♦/ 

void  writrefdong  addr,  int  proc) 

{ 

int  index; 
int  X,  leastx; 
unsigned  long  leastnsed; 
long  aline,  atag; 
int  cnum,  hit; 

/♦  PAUSE  CAPTURE  (RE-ENTRANCE)  */ 
int  tempnumcaches  =  satom.numcaches; 
s  at  om.numc  aches  =  0; 

/*  PROCESS  REFERENCE  IN  EACH  CACHE  ♦/ 
for  (cnum=0;  cnum<tempnumcaches ;  cnuin++) 

{ 

int  type  =  ( sat om. para [cnum] ) .type; 

int  assoc  =  (  s  atom,  para  [cnum]  )  .assoc  [t3rpe]  ; 

/*  UPDATE  STATISTICS  ♦/ 

((satom.stat [cnum] [proc]) .writcnt)++; 

/*  PARSE  ADDRESS  */ 

aline  =  (addr  &  ( satom . para [cnum] ) . Imask [type] )  » 

(s  atom,  para  [cnum])  .Ishift  [type]  ; 
atag  =  addr  »  (satom.paxa [cnum]  )  .tshift  [type]  ; 

/*  UPDATE  'USE  BITS'  AND  CHECK  FOR  HIT  */ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 

((satom. data [cnum] [type] [aline] [x] ) .use)++; 
if  (( (satom. data[cnuin]  [type]  [aline]  [x])  .tag  ==  atag)  && 
( (satom. data[cnum]  [t3rpe]  [aline]  [x])  .task  “  proc)) 

{ 

(satom.  data  [cnum]  [t3rpe]  [aline]  [x])  .use  =  0; 
hit  =  1; 

} 

} 

/*  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  */ 
if  (hit  ==0) 

{ 

/*  FIND  LRU  */ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

{ 


115 


if  (( (satom.dat a [cnum] [type] [aline] [x]) .use  >=  leastused)  I  I 

(( sat om. data [cnum] [type] [aline] [x]) .task  ==  sat om.numt asks)) 

{ 

leastused  =  ( s atom. data [cntun]  [type]  [aline]  [x]).use; 
leastx  =  x; 

} 

if  ( (satom. data [cnum] [type] [aline] [x] ) .task  ==  sat om.numt asks) 

X  =  assoc; 

} 

/♦  UPDATE  STATISTICS  */ 

(  (  s  at  om .  St  at  [cnum]  [pro  c]  )  .  writ  mi  s  s  cnt )  ++  ; 

( (satom.  stat  [cnum]  [proc]  )  .interf  ere[ 

(satom. data [cnum] [type] [aline] [leastx] ) .task] )++; 

/*  UPDATE  CACHE  DATA  */ 

(satom.  data  [cnum]  [tjrpe]  [aline]  [leastx]  )  .tag  =  atag; 

(satom. data [cnum] [type] [aline] [leastx]) .use  =  0; 

(satom. data[cnum]  [type]  [aline]  [leastx] )  .task  =  proc; 

> 

> 

/♦  RESUME  CAPTURE  */ 

s at om.numc aches  =  tempnumcaches ; 

return; 

> 


116 


A. 6  Program  Instrumentation  File 

The  program  instrumentation  file  prog .  inst .  c  is  not  substantially  different  from  the  kernel 
version.  The  primary  change  is  the  removal  of  the  test  for  specific  procedures  which  cannot  be 
instrumented.  The  other  alteration  is  the  inclusion  of  a  procedure  at  program  end  to  write  the 
simulations  results  to  file.  If  multiple  test  programs  are  used,  each  uses  a  different  instrumentation 
file  with  a  unique  process  identifier  assigned  in  the  #def  ine  statement. 

/*  PROG. INST. C  ♦/ 

/♦  PROGRAM  INSTRUMENTATION  FILE  ♦/ 

/*  JOHN  FRASER  */ 

#iiLclude  <string.h> 

#include  <cmplrs/atoin .  inst  .h> 

/+  DEFINE  PROCESS  ID  ♦/ 

#define  PROCNUM  1 


/*  INSTRUMENT:  */ 
/*  ALL  DATA  REFERENCES  AND  */ 
/♦  SETS  OF  8  INSTRUCTIONS  OR  LESS  */ 
/*  (WITHIN  SAME  BASIC  BLOCK)  ♦/ 
/*  ANALYSIS  ROUTINES  */ 
/*  INSTRUCTION  FETCH (ADDRESS , PID , NUMBER) */ 
/♦  DATA  LOAD (ADDRESS, PID)  */ 
/+  DATA  STORE(ADDRESS,PID)  */ 


unsigned  InstrumentAlKint  argc,  char**  argv) 

Obj*  o; 

Proc*  p; 

Block*  b; 

Inst*  i; 

/*  ADD  PROCEDURE  PROTOTYPES  */ 
AddCallProto(**initcache(int)**) ; 

AddCallPr ot o  ( **  ins tr ef  ( REGV ,  int ,  int )  ”  )  ; 
AddCallProto(*'readref  (VALUE,  int)”)  ; 

AddCallProto(”writref (VALUE,  int)”) ; 

AddCallProtoC'printres (int)”) ; 

/*  ADD  INITIALIZATION  CALL  */ 

AddCallProgram(ProgramBefore,  "initcache”,  PROCNUM); 

/*  ADD  RESULTS  OUTPUT  CALL  */ 

AddCallPrograin(PrograinAfter,  ”printres”,  PROCNUM); 

/*  ITERATE  THROUGH  ORIGINAL  CODE  ADDING  REFERENCE  CALLS  */ 
o  =  GetFirstObjO ; 
if  (BuildObj (o))  return  1; 

for  (p=GetFirstObjProc(o) ;  p!=NULL;  p=GetNextProc(p) ) 

for  (b=GetFirstBlock(p);  b!=NULL;  b=GetNextBlock(b)) 

{ 

long  pcEnd  =  InstPC(GetLastInst(b)) ; 
int  count  =  0; 

for  (i=GetFirstInst(b) ;  i!=NULL;  i=GetNextInst (i)) 


117 


{ 

if  ((count  &  7)  ==  0) 

{ 

int  instRem  =  ( (pcEnd-InstPC(i))/4)+l ; 
int  instrLine  =  (instRem  >8)  ?  8  :  instRem; 

AddCalllnst (i,InstBefore,  "instref",  REG^PC,  PROCNUM,  instrLine); 

> 

count ++; 

if  (IsInstType(i,  InstTypeLoad) ) 

AddCalllnst (i,  InstBefore,  "readref*,  Ef f AddrValue ,  PRO CRUM ) ; 
if  (IsInstType(i,  InstTypeStore) ) 

AddCalllnst (i,  InstBefore,  "writref**,  Eff AddrValue ,  PRO CRUM ) ; 

} 

} 

> 

WriteObj (o) ; 
retum(O) ; 

} 


118 


A. 7  Program  Analysis  File 

The  program  analysis  file  prog. anal. c  is  almost  identical  to  the  kernel  version,  except 
for  the  initialization  and  conclusion  routines.  The  reference  processing  routines  perform  the  same 
function,  the  other  two  are  described  below: 

Initialization  The  initialization  routine  is  much  more  complex  than  its  kernel  equivalent.  First  it 
must  map  the  shared  data  into  the  program’s  address  space  via  the  /dev/mmap  utility.  If  the 
test  program  is  the  first  to  be  executed  for  that  simulation,  it  also  reads  the  simulation  data 
from  the  input  file,  initializes  the  cache  data,  and  enables  the  simulation. 

Conclusion  The  final  routine  is  not  present  in  the  kernel  because  it  is  executed  at  program  com¬ 
pletion.  It  is  responsible  for  writing  the  simulation  results  to  the  output  file. 

A  PROG. ANAL. C  ♦/ 

/*  PROGRAM  ANALYSIS  FILE  +/ 

/♦  JOHN  FRASER  */ 

#include  <stdio.h> 

#include  <sys/types .h> 

#iiiclude  <sys/irmian.h> 

#include  <sys/stat  .1l> 

#include  <sys/errno.h> 

#include  <fcntl.h> 

#include  <mach/niachine/vm_paraia.h> 

#include  "cache  .h*' 

/*  /DEV/MMAP  DEFINITIONS  */ 

#define  k2phys(addr)  (((long) (addr))  &  Oxffffffff) 

#define  SM.MODE  (MAP_FILE|MAP^VARIABLE|MAP_SHARED) 

#define  SM^PROT  (PROT.READ |PROT_WRITE) 

/*  SHARED  CACHE  DATA  POINTER  */ 
datablock*  psatom; 

/*  ADDRESS  MAPPING  FUNCTIONS  */ 
void  FatalError(ciiar*  string) 

f print f  ( stderr ,  "ucache :  y,s\n" ,  string)  ; 
exit (1) ; 

> 

long  Get  Addr  ess  (char*  vmunixDebug,  char*  S3rmbol) 

long  addr; 

chair  command  [200]  ; 

int  fields; 

FILE*  file; 

sprintf  (command,  "nm -B  Xs  I  grep  '  ys$" , vmunixDebug,  symbol)  ; 
file  =  popen ( command,  "r")  ; 
if  (file==NULL) 

fprintf (stderr, "Open  failed:  ys\n",  command); 


119 


exit(l) ; 

} 

fields  =  fscanf  (file,"Oxy,lx”,&addr); 
if  (fields !=1)  FatalErrorC'Get  address  failed"); 
pclose(file) ; 
return  addr; 

} 

/♦  INITIALIZATION  ROUTINE  ♦/ 
void  initcache(int  proc) 

/*  GET  POINTER  TO  SHARED  DATA  IN  KERNEL  ♦/ 
caddr^t  sm^addr; 
size_t  length; 

off_t  sm^physbase,  sm.pgoff; 

unsigned  long  kbase  =  Get  Address  ("vmunix.  debug" ,  "sat  om")  ; 
int  fd  =  open("/dev/inem",  0_RDWR,  0); 
if  (fd<0)  Fat alErrorC "Unable  to  open  /dev/mem\n") ; 
sm_physbase  =  k2phys (alplia_trunc_page (kbase) ) ; 
sm^pgoff  =  kbase  &  (ALPHA_PGBYTES-1) ; 

length  =  alpha_round_page(sm_pgoff  +  sizeof (datablock)) ; 
sin_addr  =  ininap(NULL,  length,  SM_PRDT,  SM_M0DE,  fd,  sm_physbase) ; 
if  (sm_addr  ==  (caddr_t)-l)  FatalError("minap  f ailed\n") ; 
psatom  =  (datablock*)  ( (long) sm_ addr  I  (long)sm_pgoff ) ; 

/*  INCREMENT  PROCESS  COUNTER  */ 
p s at om- > c oun t + + ; 

/*  IF  FIRST  PROCESS,  INITIALIZE  CACHE  DATA  */ 
if  (proc  ==  1) 

int  t empnumcaches , tempnumtasks ; 
int  x,a,b,c,d; 

FILE  *input,  *output; 

/*  LOAD  BASIC  CHARACTERISTICS  FROM  FILE  */ 
input  =  f open ("cache. in", "r ") ; 
fgets(psatom->najtie[0] ,  79,  input); 
fscanf  (input ,  "y,d\n" , fttempniimtasks)  ; 
for  (x=l;  x<tempnumtasks;  x++) 

f gets (psatom->naiiie  Cx]  ,  79 ,  input)  ; 
fscanf  ( input ,  "y,d\n" , &t empnumcaches )  ; 
for  (x=0;  x<t empnumcaches;  x++) 

fscanf  (input,  "y,d\n",  &(psatom->paraCx]  )  .t3rpe)  ; 
if  ((psatom“->para[x]  )  .type  ==  0) 

fscanf  (input ,  "%d  y,d  yd\n",  &(psatom->p2Lra[x]  )  .  csizeCO]  , 

'&(psatom->paxaCx] )  .lsize[0]  , 
&(psatom->paLraCx3  )  .  assoc  [0])  ; 

else 

fscanf  (input ,  "Xd  y.d  %d  %d  y,d  yd\n",  &(psatom->paraCx]  )  .  csize  [0]  , 

&(psatom->paraCx] ) .IsizeCO]  , 
&(psatom->para[x] ) . assoc [0]  , 
&(psatom->para[x] ) .csize [1]  , 


120 


&(psatom->para[x]) .IsizeCl] , 
&(psatom-->para[x]  )  .  assoc  [1]  )  ; 

} 

/♦  SET  ADDRESS  HASHING  PARAMETERS  ♦/ 
for  (a=0;  a<tempiiiiincaclies;  a++) 

for  (b=0;  b<(  (psatoin->para[a] )  .t3rpe  +  1);  b++) 

{ 

(psatom->paraCa])  .tshiftCb]  =  mylog2((psatoin->para[a])  .  csize[b]/ 

(psatom‘->para[a])  .assocEb]) ; 

(psatom->paraCa]).lslLift[b]  =  mylog2(  (psatom->para[a] )  .IsizeCb]  ); 
(psatom->paraCa])  .ImaskEb]  =  ((psatoin“>paraEa3  )  .csizeEb]/ 

(psatom->paraEa3 )  .assocEb])-!; 

} 

/*  INITIALIZE  CACHE  STORAGE  */ 
for  (a=0;  a<tempiiuincaches;  a++) 

for  (b=0;  b<(  (psatom->p2traEa]  )  .type  +  1);  b++) 
for  (c=0;  c<((psatom->paraEa]) .csizeEb]/ 

(  (psatoni->para  Ea]  )  .  Isize  Eb]  * 

(psatom->paraEa]) .assocEb] )) ;  C++) 
for  (d=0;  d<(psatom->paraEa]  ). assocEb] ;d++) 

(psatom->dataEa]  Eb] Ec] Ed] ) .use  =  0; 

(psatom->dataEa]  Eb] Ec] Ed]) .task  =  tempnumtasks ; 

} 

/♦  INITIALIZE  CACHE  STATISTICS  ♦/ 
for  (a=0;  a<tempnumcaches ;  a++) 
for  (b=0;  b  < tempnumtasks;  b++) 

(psatom->stat Ea] Eb] ) . instcnt  =  0; 

(psatom->stat Ea] Eb] ) .readcnt  =  0; 

(psatom->stat Ea] Eb] ) .writcnt  =  0; 

(psatom->stat Ea] Eb]) .instmisscnt  =  0; 

(psatom->stat  Ea] Eb] ) . readmisscnt  =  0 ; 

(psatom->stat Ea] Eb]) .writmisscnt  =  0; 
for  (c=0;  c  <=  tempnumtasks;  C++) 

(psatom->stat  Ea] Eb] ) . interfere  Ec]  =  0 ; 

} 

/*  LOG  SIMULATION  DATA  TO  OUTPUT  FILE  */ 
output  =  fopen(’'caclie.out*',*'w") ; 
f print f  (output ,  “XnXnNnXnXnXnNnXn*' )  ; 

fprintf (output , "<><><><><><><><><><><><><><><><><>\n") ; 
f  printf  (output ,  "SIMULATION :  y,s"  ,psatom->name  EO]  )  ; 
fprintf (output ,"<><><><><><><><><><><><><><><><><>\n’*) ; 
fprintf  (output ,  *'\n\n\n\n")  ; 

fprintf  (output /'Number  Tasks  =  y,d\n\n'' ,  tempnumtasks )  ; 
fprintf  (  output , "  #0 :  kemelXnXn"  )  ; 

for  (x=l;  x<tempnumtasks ;  x++) 

fprintf  (output , "  #y,d:  ,x,psatom->nameEx]  )  ; 

fprintf  (output ,  **\n\n\n\n'' )  ; 

fprintf  (output ,  "Number  Caches  =  y,d\n" ,  tempnumcaches)  ; 
fprintf (output (type,  icsize,  ilsize,  iassoc. 


121 


dcsize,  dlsize,  dassoc)\n\n") ; 
for  (x=0;  x<tempiLiimc aches;  x++) 

fprintfC output,"  #y,d:  Xld  '/Jd  Y.Bd  •/•3d",x, 

(psatoin->para  [x]  )  .  t3rpe , 
(psatom->paraCx3  )  .  csize  [0]  , 
(psatom“>para [x] ) . Isize [0] , 
(psatom->para[x3 ) . assoc [0]  ) ; 

if  ((psatom“>para[x]  )  .type  ==  1) 

fprintf  (output,"  VJd  y,5d  y,3d" ,  (psatom~>para[x]  )  .csize [1]  , 

(psatom->para[x] ) . Isize [1] , 
(psatom->paxa[x] ) . assoc [1]  )  ; 

fprintf (output , "\n\n") ; 

> 

fprintf (output , "\f ") ; 
f close (output) ; 

/*  START  CAPTURE  &  SIMULATION  */ 
psatom-”>nuintasks  =  tempnumtasks; 
psatom-‘>nuincaches  =  tempnumcaches; 
psatoin->act  caches  =  tempnumcaches; 
psatom~>curtask  =  -1; 

> 

return; 

} 


/*  INSTRUCTION  REFERENCE  ROUTINE  */ 

void  instrefdong  addr,  int  proc,  int  count) 

int  X,  leastx; 
unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/*  PAUSE  CAPTURE  (RE-ENTRANCE)  */ 
int  tempnumcaches  =  ps at om->numc aches; 
psatom->numcaches  =  0; 

/*  RE-ESTABLISH  AFTER  CONTEXT  SWTICH  (RE-ENTRANCE)  */ 
if  (psatom->curtask  !=  proc) 

tempnumcaches  =  psatom->act caches; 
psatom->curtask  =  proc; 

} 

/*  PROCESS  REFERENCES  IN  EACH  CACHE  ♦/ 
for  (cnum=0;  cnum<tempnumcaches;  cnum++) 

int  assoc  =  (psatom->para [cnum]  )  .assoc [0]  ; 

/*  UPDATE  STATISTICS  +/ 

(  (psatom->stat  [cnum]  [proc]  )  .  instcnt )  +=  count ; 

/*  PARSE  ADDRESS  ♦/ 

aline  =  (addr  &  (psatom->para[cnum] )  .lmask[0]  )  » 
(psatom->para[cnum]  )  .Ishift  [0]  ; 


122 


atag  =  addr  »  (psatom-*>para[cnuin]  )  .  tshift  [0]  ; 
/*  UPDATE  'USE  BITS'  AND  CHECK  FOR  HIT  */ 


hit  =  0; 

for  (x=0;  x<assoc;  x++) 

((psatoin-">dataCcmim]  [0]  [aline]  [x])  .use)++; 
if  (((psatoni->data[cmiin3  [0]  [aline]  [x]  )  .tag  ==  atag)  && 
((psatoin“>data[cnnm]  [0]  [aline]  [x]  )  .task  ==  proc)) 

{ 

(psatom->data[cniim]  [0]  [aline]  [x]  )  .use  =  0; 
hit  =  1; 

} 

} 

/♦  IF  NOT  HIT,  FIND  LRU  BLOCK  TO  EVICT  */ 
if  (hit  ==  0) 


/♦  FIND  LRU  ♦/ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

if  (((psatom->data[cnum] [0] [aline] [x]) .use  >=  leastused)  II 

((psatom->data[cnuin]  [0]  [aline]  [x]  )  .task  == 

ps  at  om~‘>numt  asks  )  ) 


{ 

leastused  =  (psatom->data[cnuin]  [0]  [aline]  [x]  )  .use; 
leastx  =  x; 

} 

if  ((psatom->data[cnum]  [0]  [aline]  [x]  )  .task  == 

p  s  at  om-  >numt  asks) 


X  =  assoc; 

> 

/*  UPDATE  STATISTICS  ♦/ 

(  (psatom->stat  [cnum]  [proc]  )  .  instmisscnt)++ ; 

(  (psatom->stat  [cnum] [proc] ) . interfere [ 
(psatom->data[cnum]  [0]  [aline]  [leastx]  )  .task]  )++; 

/*  UPDATE  CACHE  DATA  ♦/ 

(psatom->data[cnum]  [0]  [aline]  [leastx] ). tag  =  atag; 
(psatom->data[cnum] [0] [aline] [leastx]) .use  =  0; 
(psatom“>data[cnum]  [0]  [aline]  [leastx]  )  .task  =  proc; 
> 

} 

/*  RESUME  CAPTURE  ♦/ 
psatom“>numcaches  =  tempnumcaches; 
return; 

} 


/*  DATA  LOAD  ROUTINE  ♦/ 

void  readref(long  addr,  int  proc) 

•C 

int  index; 
int  X,  leastx; 


123 


unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/♦  PAUSE  CAPTURE  (RE-EHTRANCE)  */ 
int  tempnumcaches  =  p s at om~>nuinc aches; 
psatoin->nuincaches  =  0; 

/♦  RE-ESTABLISH  AFTER  CONTEXT  SWITCH  (RE-ENTRANCE)  */ 
if  (psatom->curtask  !=  proc) 

{ 

tempnumcaches  =  psatom->act caches; 
psatom->curtask  =  proc; 

> 

/*  PROCESS  REFERENCE  IN  EACH  CACHE  */ 
for  (cnum=0;  cnum<tempnumcaches ;  cnum++) 

{ 

int  t3rpe  =  (psatom->para [cnum] )  .type; 

int  assoc  =  (psatom->para [cnum]  )  .assoc Ct3rpe]  ; 

/♦  UPDATE  STATISTICS  */ 

( (psatom->stat [cnum] [proc] ) .readcnt)++ ; 

/*  PARSE  ADDRESS  */ 

aline  =  (addr  &  (psatom->para [cnum]  )  .Imask [type]  )  » 
(psatom->para[cnum]  )  .Ishift  [type]  ; 
atag  =  addr  »  (psatom->para[cnuin]  )  .tshift  [t3rpe]  ; 

/*  UPDATE  'USE  BITS'  AND  CHECK  FOR  HIT  */ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 

((psatom->data[cnum] [type] [aline] [x] ) .use)++; 
if  ( ((psatom->data [cnum] [type] [aline] [x]) .tag  ==  atag)  && 
((psatom->data[cnum]  [type]  [aline]  [x])  .task  ==  proc)) 

(psatom->data[cnum]  [type]  [aline]  [x])  .use  =  0; 
hit  =  1; 

} 

} 

/♦  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  ♦/ 
if  (hit  ==  0) 

{ 

/*  FIND  LRU  */ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

if  (((psatom->data[cnum]  [t3rpe]  [aline]  [x]  )  .use  >=  leastused)  I  I 
((psatom->data[cnum]  [t3rpe]  [aline]  [x])  .task  == 

psatom->numtasks  )  ) 

leastused  =  (psatom->data[cnum]  [t3rpe]  [aline]  [x]  )  .use; 
leastx  =  x; 

} 

if  ((psatom->data[cnum]  [type]  [aline]  [x]  )  .task  == 

p  s  at  om->numt  asks  ) 


124 


X  =  assoc; 

> 

/*  UPDATE  STATISTICS  ♦/ 

((psatom->stat  [cniim]  [proc])  .readmissciLt)++; 

( (psatom->stat  Ccnum] [proc] ) . interfere [ 
(psatoin’~>data[ciium]  [type]  [aline]  [leastx])  .task])++; 

/♦  UPDATE  CACHE  DATA  ♦/ 

(psatom->ciata[cniLm]  [type]  [aline]  [leastx])  .tag  =  atag; 
(psatom->data[cnnin]  [type]  [aline]  [leastx])  .use  =  0; 
(psatom->data[cnu3ii]  [type]  [aline]  [leastx] )  .task  =  proc; 
} 

} 

/*  RESUME  CAPTURE  */ 

ps  at  oia->numc  aches  =  tempnumcaches ; 

return; 

} 

/*  DATA  STORE  ROUTINE  ♦/ 

void  writrefClong  addr,  int  proc) 

{ 

int  index; 
int  X,  leastx; 
unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/♦  PAUSE  CAPTURE  (RE-ENTRANCE)  ♦/ 
int  tempnumcaches  =  psatom->numcaches ; 
psatom->numcaches  =  0; 

/*  RE-ESTABLISH  AFTER  CONTEXT  SWTICH  (RE-ENTRANCE)  */ 
if  (psatom-> curt ask  !=  proc) 

•C 

tempnumcaches  =  p  s  at  om->act  caches; 
psatom->curtask  =  proc; 

> 

/*  PROCESS  REFERENCE  IN  EACH  CACHE  */ 
for  (cnum=0;  cnum< tempnumcaches;  cnum++) 

int  t3rpe  =  (psatom->para [cnum]  )  .type ; 

int  assoc  =  (psatom->para [cnum]  )  .assoc [tjpe]  ; 

/*  UPDATE  STATISTICS  ♦/ 

( (psatom->stat [cnum] [proc] ) .writcnt)++; 

/*  PARSE  ADDRESS  */ 

aline  =  (addr  &  (psatom->paxa [cnum] ) .Imask [type] )  » 
(psatom->para  [cnum]  )  .  Ishif t  [t3rpe]  ; 
atag  =  addr  »  (psatom->para[cnum]  )  .tshift  [t3rpe]  ; 

/♦  UPDATE  'USE  BITS'  AND  CHECK  FOR  HIT  ♦/ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 

{ 

((psatom->data[cnum]  [type]  [aline]  [x]  )  .use)++; 

if  ( ((ps at om“>data [cnum]  [type] [aline] [x] ) .tag  ==  atag) 


125 


((psatom->data[cimin]  [type]  [aline]  [x])  .task  ==  proc)) 

(psatom“>data[cnnm]  [type]  [aline]  [x])  .use  =  0; 
hit  =  1; 

} 

} 

/*  IF  NOT  HIT,  FIND  LRU  BLOCK  TO  EVICT  +/ 
if  (hit  ==  0) 

{ 

/♦  FIND  LRU  */ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

if  (((psatom*->data[cmiin]  [type]  [aline]  [x])  .use  >=  leastused)  I  I 
((psatom->data[cnum]  [type]  [aline]  [x]  )  .task  == 

psat  om->numt  asks ) ) 

leastused  =  (psatom->data[cnum]  [type]  [aline]  [x])  .use; 
leastx  =  x; 

} 

if  ((psatom~>data[cnuin]  [t3rpe]  [aline]  [x]  )  .task  == 

psatom->nuint  asks  ) 

X  =  assoc; 

} 

/*  UPDATE  STATISTICS  */ 

((psatom->stat [cnum]  [proc] ) . writmisscnt)++; 

( (psatom->stat [cnum]  [proc] ) . interfere [ 

(psatom->data[cnum]  [type]  [aline]  [leastx]  )  .task]  )++; 

/♦  UPDATE  CACHE  DATA  */ 

(psatom->data[cnum]  [t3rpe]  [aline]  [leastx])  .tag  =  atag; 

(p  s  at  om->dat  a  [cnum]  [type]  [aline]  [leastx])  .use  =  0; 
(psatom->data[cnum]  [type]  [aline]  [leastx] )  .task  =  proc; 

} 

} 

/*  RESUME  CAPTURE  */ 
psatom~>numcaches  =  tempnumcaches ; 
return; 

> 

/*  STORE  RESULTS  ROUTINE  */ 
void  printres(int  proc) 

int  c,x,y; 
stats  total; 

FILE*  file; 

/*  PAUSE  CAPTURE  */ 

int  tempnumcaches  =  psatom->act caches; 
psatom->numcaches  =  0; 

/*  OPEN  FILE  FOR  OUTPUT  */ 
file  =  f open( "cache. out *‘a*’ )  ; 

fprintf  (file /'DATA  AT  END  OF  PROCESS  •/•d\n",proc) ; 


126 


f printf  (file ,  ••<><><><><><><><><><><><><><><><><><><><><><><><><>\ii‘' )  ; 

/*  PRINT  DATA  FOR  EACH  CACHE  */ 
for  (c=0;  c<tempiniincaclies;  C++) 

{ 

f  printf  (file,  "simulation:  V.s  (data  at  end  of  process  •/,d)\n’', 

psatom->name [0] ,proc) ; 

fprintf  (f ile," - W')  ; 

f  printf  (file,  "CACHE  #  ‘/.dV,  c)  ; 

f  printf  (file,  "cache  type:  y.d  (0=iinified,  l=split)\n", 

(psatom“>para[c3 ) .type) ; 

f  printf  (file,  "icache  size:  y.dXn",  (psatom“>para[c])  .csize[0]  )  ; 
f  printf  (file,  "icache  line  size:  y,d\n" ,  (psatom->para[c]  )  .IsizeCO]  )  ; 
f  printf  (file,  "icache  associativity:  y.d\n", 

(psatom->paxa[c] ) . assoc [0] ) ; 

if  ((psatom->paraCc3)  .t3rpe  ==  1) 

f  printf  (file,  "dcache  size:  y.d\n" ,  (psatom->para[c3  )  .csizeCl3  )  ; 
f  printf  (file,  "dcache  line  size:  y,d\n" ,  (psatom“>para[c3  )  .Isize  Cl3  )  ; 
fprintf (file, "dcache  associativity:  y.d\n", 

(psatom->para[c3 ) .assoc [13 ) ; 

} 

total. instcnt  =  0; 
total.readcnt  =  0; 
total. writ cnt  =  0; 
t otal. ins tmis sent  =  0; 
total .readmiss cnt  =  0; 
total . writ mis sent  =  0; 

/♦  PRINT  PROCESS  CACHE  PERFORMANCE  *./ 
for  (y=0;  y  <  psatom-‘>numtasks;  y++) 

-C 

int  z; 

total. instcnt  =  total . instcnt  +  (psat om-“>st at Cc3  Cy3 ). instcnt ; 
total . readent  =  total . readent  +  (psatom->stat [c3  Cy3 ) . readent ; 
total. writent  =  total. writent  +  (psatom->stat [c3 Cy3 ) .writ cnt ; 
total . ins tmis sent  =  total. instmis sent  + 

(psatom->stat [c3  Cy3 ) . instmis sent ; 
total. readmiss cnt  =  total. readmiss cnt  + 

(psatom->stat  Cc3  Cy3 ) . readmissent ; 
total.writmisscnt  =  total .writmis sent  + 

(psatom->stat  Cc3  [y3 ) . writmis sent ; 
fprintf (file,"  +****+*+**\n") ; 

fprintf  (file,"  Process  #y,d\n",  y)  ; 

fprintf  (file,"  Inst  y,121u  ",  (psatom->stat[c3  Cy3  ).  instcnt)  ; 

fprintf  (file,  "Miss  y,121u  ",  (psat  om->st  at  Cc3  Cy3  )  .instmis  sent)  ; 
if  (  (psatom->stat  [c3  Cy3  )  .  instcnt  !  =  0) 
fprintf  (file,  "Perc  5(.61f",  100.0  * 

(psatom->stat  Cc3  Cy3 ) . instmis sent  / 

(psat om->s tat  Cc3  [y3 ) . instcnt) ; 

fprintf  (file,  "\n  Data  y,121u  ",  (psatom">stat  [c3  [y3  )  .readent  + 

(psatom->stat [c3  Cy3 ) .writent) ; 

fprintf  (file,  "Miss  •/•121u  ",  (psat  om->st  at  [c3  [y3  )  .readmissent  + 


127 


(psatom~>stat  [c]  [y]  )  .writ  mis  sent)  ; 

if  ( ((psatom->stat [c] [y] ) .readcnt+(psatom->stat  [c]  [y] ) .writent)  !=  0) 
fprintf (file/'Perc  •/•.61f",  100.0  ♦ 

(  (psatom->stat  [c]  [y]  )  .  readmissent  + 
(psatom~>stat  [c]  [y] ) . writmissent)  / 

( (psatom“>stat [c]  [y] ) . readent  + 
(psatom->stat [c] Cy] ) . writent) ) ; 
fprintf (f ile, “Xn  read  y,12lTi 

(psatom->stat [e]  [y] ) .readent) ; 

fprintf  (file, "Miss  y,121n  ",  (psat om“>st at  [e]  [y] )  .readmissent) ; 
if  ((psatom->stat[e] [y] ) .readent  !=  0) 
fprintf (file,"Pere  %.61f",  100.0  * 

(psatom->stat [e] Cy] ) .readmissent  / 
(psatom->stat [c] [y] ) .readent) ; 

fprintf  (file,  "\n  writ  y,121u  ",  (ps  at  om-->st  at  [e]  [y]  )  .writent) ; 

fprintf  (file,  "Miss  y.l21u  ",  (psatom->stat  [c]  [y]  )  .  writmissent) ; 
if  ((psatom-’>stat[e]  Cy]  )  .writent  !-  0) 
fprintf  (file,  "Pere  >(.61f",  100.0  * 

(psatom->stat Cc] Cy] ) .writmissent  / 
(psatom->stat  Cc] Cy] ) .writent) ; 

fprintf (file," \n  TOTAL  %121ii  ",  (ps atom->st at Cc] Cy] ). instent  + 

(psatom'->stat  Cc]  Cy]  )  .readent  + 
(psatom->stat  Cc]  Cy]  )  .writent)  ; 

fprintf  (file,  "Miss  iCl21u  ",  (psat  om->st  at  Cc]  Cy]  )•  instmis  sent  + 

(psatom->stat  Cc]  Cy]  )  .readmissent  + 
(psatom“>stat  Ce]  Cy]  )  .  writmissent)  ; 
if  ( ( (psatom->stat  Cc] Cy] ) . instent  + 

(psatom->stat  Cc] Cy] ) .readent  + 

(psatom->stat Cc] Cy] ) .writent)  !=  0) 
fprintf  (file,  "Pere  y,.61f",  100.0  ♦ 

(  (psatom->stat  Cc]  Cy]  ) .  instmis  sent  + 
(psatom->stat  Cc]  Cy]  )  .  readmissent  + 
(psatom“>stat  Cc]  Cy]  )  .writmissent)  / 

(  (psatom”>stat  Cc]  Cy]  )  .  ins  tent  + 
(psatom->stat  Cc]  Cy]  )  .readent  + 
(psatom“>stat  Cc]  Cy]  )  .writent)) ; 

fprintf (file, "\n  Int  (times  pro e ess  %d  overwrote: )\n",  y)  ; 

for  (z=0;  z  <=  psatom->mimtasks;  z++) 

fprintf (file, "  Proeess  %d  =  Xl21u\n",  z, 

(psatom->stat [e] Cy] ) . int erf ere Cz]  ) ; 
fprintf  (file, "  (proeess  y,d  is  invalid  data)\n", 

psatom“>nnmtasks) ; 


} 


/*  PRINT  TOTAL  CACHE  PERFORMANCE  */ 

fprintf  (file,"  +  +  +  +  +  +  • 

fprintf (file,"  TOTAL  FOR  CACHEXn"); 

fprintf (file,"  Inst  yi21u  ",  total . instent) ; 

fprintf (file, "Miss  %121u  ",  total. instmis sent) ; 

if  (total. instent  !=  0) 

fprintf (file, "Pere  %.61f",  100.0  *  total . instmis sent  / 

total . instent) ; 


128 


fprintf (file/‘\n  Data  %121u  ",  total.readcnt  + 

total. writ cut) ; 

fprintf (file, "Miss  %121u  ",  tot al.readmis sent  +  total .wr itmis sent) ; 
if  ((total.readcnt  +  total .writ ent)  !=  0) 
fprintf  (file,  "Perc  •/•.61f",  100.0  * 

(total. readmissent  +  total .writmis sent)/ 
(total.readcnt  +  total.writcnt) ) ; 
fprintf  (file,  "\n  read  y,121u  ",  total.readcnt); 

fprintf  (file,  "Miss  •/•121u  ",  total.readmisscnt)  ; 
if  (total.readcnt  !=  0) 

fprintf  (file,  "Perc  '/,.61f",  100.0  ♦  total.readmisscnt  / 

total . readent ) ; 

fprintf  (file,  "\n  writ  •/,121u  ",  total.writcnt); 

fprintf  (file,  "Miss  y,121n  ",  total.writmisscnt)  ; 
if  (total.writcnt  !=  0) 

fprintf  (file,  "Perc  y,.61f",  100.0  *  total.writmisscnt  / 

total.writcnt) ; 

fprintf  (file,  "\n  TOTAL  y,121u  ",  total,  instent  + 

total.readcnt  + 
total . writent ) ; 

fprintf  (file,  "Miss  y,121u  ",  total,  ins  t  mis  sent  + 

total.readmisscnt  + 
total.writmisscnt) ; 

if  ((total. instent  +  total.readcnt  +  total.writcnt)  !=  0) 
fprintf  (file,  "Perc  y..61f",  100.0  * 

(total,  instmissent  + 
total.readmisscnt  + 
total.writmisscnt)  / 

(total . instent  + 
total.readcnt  + 
total.writcnt)) ; 

fprintf (file, "\n") ; 
fprintf (file, "\f") ; 

} 

f close(f ile) ; 

/*  IF  LAST  PROCESS,  SHUT  DOWN  SIMULATION  */ 
psatom->coxint — ; 
if  (psatom->coiint  >  0) 

psatom->mimcaclLes  =  tempniimcaclies; 
psatom->ciirtask  =  proc; 

} 

return; 

} 


129 


A. 8  Sample  Tool  Description  File 

To  create  an  ATOM  tool,  a  tool  description  file  must  be  created  which  defines  the  various 
tool  characteristics  such  as  the  files  to  incorporate  and  control  flags  to  use.  An  example  is  shown 
below,  which  is  the  tool  used  to  create  the  executable  version  of  the  kernel  kexe.desc.  For  more 
information,  please  refer  to  the  ATOM  source  documents. 

IlfST^FILE  kem .  inst .  c 

AML_FILE  kern .  anal .  c 

AML.LDFLAGS  -non.sheired 

ATOM.REQ  -Xkernel  “Xgprog 

ATOM^DEF  -o  vmunix . cache 

Another  tool  example  is  the  one  used  for  the  context  switch  model,  mod.desc,  which  shows 
the  -Im  flag  required  to  use  functions  from  the  libm.a  library. 

mST^FILE  prog .  inst .  c 

ANAL_FILE  model . anal . c 

ANAL_LDFLAGS  -Im 


130 


A. 9  Model  Library 

The  following  file,  model. h,  was  used  as  a  procedure  library  for  the  context  switch  model 
implementation.  It  is  used  in  conjunction  with  the  cache  model  library. 

/*  MODEL. H  */ 

/♦  CONTEXT  SWTICH  MODEL  LIBRARY  */ 

/*  JOHN  FRASER  */ 

#include  <stdlib.h> 

#i3iclude  <math.h> 

/*  COMPUTE  RANDOM  EXECUTION  INTERVAL  */ 
long  compintO 

long  temp  =  randomO; 

temp  =  (long)  trunc  (*“50000.  O*log(l  .0"  (randomO /(pow  (2 .0,31 .0)-l  .0))  )) ; 

/*  INTERVAL  CAP  +/ 
if  (temp  >  250000) 
retum(250000) ; 
else 

return  (temp) ; 

} 

/*  COMPUTE  FACTORIAL  FUNCTION  ♦/ 
double  myf act (long  x) 

if  (x  ==  0) 
retuxn(l  .0) ; 
else 

ret  urn  ((double)  x  ♦  myf  act(x-l)  )  ; 

} 

/♦  COMPUTE  COMBINATORIAL  FUNCTION  ♦/ 
double  mycomb(long  F,  long  i) 

{ 

long  x; 

double  temp3  =  1.0/myfact(i) ; 

/*  CANT  USE  STANDARD  FACTORIAL  EXPRESSION  =>  OVERFLOW  ERROR  */ 
for  (x=F;  x>F-'i;  x — ) 
temp3  =  temp3  +  x; 
return (temp3) ; 

> 

/♦  COMPUTE  BLOCK  OVERWRITE  PROBABILITY  */ 

double  calcprobdong  F,  int  C,  int  B,  int  A,  int  i) 

int  x; 

double  temp2  =  0.0; 
int  N  =  C/(B*A); 
if  (i  <  A) 

{ 


131 


double  a,b,c; 

a  =  (double) (my comb (F, i) ) ; 

b  =  (double) (pow((1.0/(double)N), (double) i)); 

/♦  UTOERFLOW  TEST  FOR  LAST  TERM  */ 
if  ((F-i)*log(1.0-(1.0/(double)N))  <  -600.0) 
c  =  0; 
else 

c  =  (double)po¥((1.0-(l.0/(double)N)),((double)(F-i))); 
retum(a*b*c) ; 

} 

else 

for  (x=0;  X  <  A;  x++) 

temp2  =  temp2  +  ( (double) (mycomb(F,x) )  * 

(pow((1.0/N) ,x))  ♦ 
(pow((1.0-(1.0/N)),(F-x)))); 

retumd.O  -  temp2); 

> 

/*  COMPUTE  INSTRUCTION  FOOTPRINT  ♦/ 
long  ifootdong  R,  int  B) 

retuxn((long)trunc(R/(50.0*B)))  ; 

} 

/+  COMPUTE  DATA  FOOTPRINT  */ 
long  dfoot(long  R) 

r etum(( long )t rune (R/50. 0) )  ; 

} 


132 


A.  10  Model  Analysis  File 

The  files  used  to  test  the  context  switch  model  were  very  similar  to  those  used  in  the  first  set 
of  simulations.  The  program  instrumentation  file  was  identical,  and  the  analysis  file  model .  anal .  c 
was  generally  the  same,  although  with  the  addition  of  the  model  code  cis  shown.  Since  the  model 
was  tested  with  a  single  process  trace,  the  re-entrance  mechanisms  were  not  required. 

/*  MODEL. AML. C  */ 

/♦  PROGRAM  AMLYSIS  FILE  */ 

/♦  W/  CONTEXT  SWITCH  MODEL  */ 

/*  JOHN  FRASER  */ 

#include  <stdio.lL> 

#include  *' cache. h" 

#include  *'model.h" 

/♦  CACHE  DATA  */ 
datable ck  satom; 
datablock*  psatom; 

/*  MODEL  DATA  ♦/ 
unsigned  long  switchnext; 
unsigned  long  switchent; 
unsigned  long  switchrec; 

/*  INITIALIZATION  ROUTINE  ♦/ 
void  init cache (int  proc) 

/*  SET  POINTER  TO  CACHE  DATA  */ 
psatom  =  ftsatom; 

/*  INITIALIZE  BASIC  DATA  ♦/ 
psatom->count  =  0; 
ps  at  om->numc  aches  =  0; 
psatom~>numtasks  =  0; 

/*  INITIALIZE  SWITCH  MODEL  */ 
switchent  =  0; 
switchrec  =  0; 
switchnext  =  compintO; 

/*  IF  FIRST  PROCESS,  INITIALIZE  CACHE  DATA  ♦/ 
psatom->count++ ; 
if  (psatom->count  ==  1) 

int  tempnumcachesjtempnumtasks; 
int  x,a,b,c,d; 

FILE  *input,  *output; 

/*  LOAD  BASIC  CHARACTERISTICS  FROM  FILE  ♦/ 
input  =  fopenC'cache.in" ; 
fgets(psatom->name[0] ,  79,  input); 
f  scan! (input ,  •'•/dXn*'  ,&tempnumtasks)  ; 
for  (x=l;  x<tempnumtasks;  x++) 

fgets(psatom“>nameCx] ,  79,  input); 
fscanf  (input , *'%d\n” , fttempnumc aches)  ; 


133 


for  (x=0;  x<tempiniinc aches;  x++) 

{ 

fscanf  (input ,  "y*<i\n'*,  &(psatom->pcu:aCx]  )  .t3rpe)  ; 
if  ( (psatom->para[x] ) .type  ==  0) 

fscanf  (input,  ''yd  “/.d  Xd\n",  &(psatora->para[x]  )  .  csize[0]  , 

&(psatom->para[x] ) .bsizeCO]  , 
&(psatoin'->para[x]  )  .assoc [0]  )  ; 

else 

fscanf  (input  ,*'%d  %d  ^d  Xd  ’/d  y,d\n”,  &(psatora->para[x]  )  .  csize  [0]  , 

&(psatoin->para[x]  )  .bsize[0] , 
&(psatom->para[x] ) .  assoc  [0]  , 
&  (psatom->para [x] ) . csize [1] , 
&(psatom->para[x])  ,bsize[l] , 
ft  (psat om->para [x] ) . assoc [1]  ) ; 

> 


/*  SET  ADDRESS  HASHING  PARAMETERS  ♦/ 
for  (a=0;  a<tempnuin caches;  a++) 

for  (b=0;  b<((psatom~>paraCa] )  .type  +  1);  b++) 

{ 

(psatom->para  [a]  )  .  tshif  t  [b]  =  mylog2  (  (psatom->para  [a]  )  .  csize  [b]  / 

(psatom->para[a]  )  .  assoc  [b]  )  ; 
(psatom->paxa [a]  )  .  Ishif  t  [b]  =  mylog2  (  (psatom->para [a]  )  . bsize  [b]  )  ; 
(psatom->para  [a]  )  .  Imask  [b]  =  (  (psatom->p2ura[a]  )  .  csize  [b]  / 

(psatom->para[a]  )  . assoc  [b]  )~-l; 

} 

/♦  INITIALIZE  CACHE  STORAGE  */ 
for  (a=0;  a<tempnuincaches ;  a++) 

for  (b=0;  b<(  (psatoin“>para[a] )  .t3rpe  +  1);  b++) 
for  (c=0;  c<( (psatom->p2Lra[a3 )  . csize [b]  / 

((psatoin~>paraCa])  .bsize[b]  * 

(psat om->para [a] ) . assoc [b] ) ) ;  C++) 
for  (d=0;  d<(psatom->para[a] ) . assoc [b] ;d++) 

(psatom“>data[a3  [b]  [c]  [d])  .use  =  0; 

(psatom->dataCa3  [b]  [c]  [d])  .task  =  tempnumtasks; 

} 

/*  INITIALIZE  CACHE  STATISTICS  ♦/ 
for  (a=0;  a<teinpnumc aches;  a++) 
for  (b=0;  b  <tempnuintasks ;  b++) 

{ 


(psatom->stat  [a]  [b]  )  .  instcnt  =  0 ; 
(psatom->stat  [a]  [b]  )  .readout  =  0; 
(psatom->stat  [a]  [b]  )  .  writcnt  =  0; 
(psatom->stat  [a]  [b]  )  .  instmisscnt  =  0; 
(psatom->stat  [a]  [b]  )  .readmisscnt  =  0; 
(psatom->stat  [a]  [b]  )  .writmisscnt  =  0; 
for  (c=0;  c  <=  tempnumtasks;  C++) 

(psat  om->s  tat  [a]  [b]  )  .  interfere  [c]  =  0 ; 

} 


/♦  LOG  SIMULATION  DATA  TO  OUTPUT  FILE  ♦/ 
output  =  fopen(*'cache.out*‘,"w") ; 


134 


Iprintf  (output ,  *'\n\n\n\u\ii\n\n\n")  ; 

f print f (output ,  »’<><><><><><><><><><><><><><><><><><><><><><>\n”) ; 
f print! (output ,  ’‘SIMULATION  (single)  :  %s“  ,psatom->name  [0]  ) ; 
f print! (output ,  *'<><><><><><><><><><><><><><><><><><><><><><>\r" ) ; 
!print!  (output ,  ”\n\n\n\n"  )  ; 

!print! (output,  "Number  Tasks  =  y.dXnXn"  ,tempnuintasks)  ; 

!or  (x=l;  x<tempnumtasks;  x++) 

!print! (output , "  tf'/id :  */s\n"  ,x,psatom->name  [x] ) ; 

!print!  (output ,  "\n\n\n\n"  )  ; 

!print!  (output ,  "Number  Caches  =  y,d\n"  ,tempnxiincaches)  ; 

!print!(output , "  (type,  icsize,  ibsize,  iassoc, 

dcsize,  dbsize,  dassoc)\n\n") ; 

!or  (x=0;  x<tempnumc aches;  x++) 

!print! (output ,"  #y,d:  '/.Id  y,7d  y,5d  y,3d",x, 

(psatora“>para[x]  )  .t3rpe, 
(psatom->para[x] ) . csize [0] , 
(psatom~>paraCx] ) .bsize[0] , 
(psatom->para [x] ) . assoc [0]  ) ; 

i!  ((psatom->para[x]  )  .t3rpe  ==  1) 

!print!( output,"  y,7d  y.5d  y,3d" ,  (psatom->p2Lra[x]  ). csize [1]  , 

(psatom'->paraCx]  )  .bsizeCl]  , 
(psatom->paraCx]  )  .  assoc  [1]  )  ; 

!print! (output ,  "\n\n")  ; 

> 

!print!  (output ,  "\!"  )  ; 

!close (output) ; 

/*  START  SIMULATION  */ 
psatom->numtasks  =  tempnumtasks ; 
psatom->numcaches  =  tempnumcaches; 

} 

retura; 

> 


/*  INSTRUCTION  REFERENCE  ROUTINE  */ 

void  instre!(long  addr,  int  proc,  int  count) 

{ 

int  X,  leastx; 
unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/*  PROCESS  REFERENCES  IN  EACH  CACHE  */ 

!or  (cnum=0;  cnum  <  psatom~>numcaches;  cnum++) 

int  assoc  =  (ps at om->para [cnum] ) .assoc [0] ; 

/*  UPDATE  STATISTICS  */ 

(  (psatom->stat  [cnum]  Cproc]  )  .  instcnt )  +=  count ; 

/*  PARSE  ADDRESS  ♦/ 

aline  =  (addr  &  (psatom->para[cnum] ) .lmask[0] )  » 
(psatom-’>para[cnum]  )  .lshi!t  [0]  ; 


135 


atag  =  addr  »  (psatom->para[ciLiiin]  )  .  tshift  [0]  ; 

/*  UPDATE  ^USE  BITS^  AND  CHECK  FOR  HIT  */ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 

((psatoin->data[cinim]  [0]  [aline]  [x]  )  .nse)++; 
if  (((psatoin->dataCcnxim]  [0]  [aline]  [x])  .tag  ==  atag)  && 
((psatom-">data[cniiin]  [0]  [aline]  [x]  )  .task  ==  proc)) 

(psatom“>data[cmiin]  [0]  [aline]  [x]  )  .use  =  0; 
hit  =  1; 

} 

> 

/*  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  */ 
if  (hit  ==  0) 

{ 

/*  FIND  LRU  */ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

if  (( (psatom->data[cnum] [0] [aline] [x] ) .use  >=  leastused)  II 
((psatom“>data[cnuin]  [0]  [aline]  [x]).task  == 

psatom~>nuintasks)  ) 

L 

leastused  =  (psatom“>data[cnuja]  [0]  [aline]  [x]  )  .use; 
leastx  =  x; 

} 

if  ((psatom->data[cnum]  [0]  [aline]  [x]  )  .task  == 

p  s  at  om-  >nuint  asks  ) 

X  =  assoc; 

} 

/*  UPDATE  STATISTICS  */ 

( (psatom“>stat [cnum] [proc] ) . instmisscnt)++; 

( (psatom“>stat  [cnum] [proc] ) . interfere [ 

(psatom->data[cnutn]  [0]  [aline]  [leastx])  .task])++; 

/♦  UPDATE  CACHE  DATA  ♦/ 

(psatom->data[cnum]  [0]  [aline]  [leastx]  )  .tag  =  atag; 
(psatom->data[cnum]  [0]  [aline]  [leastx]  )  .use  =  0; 
(psatom->data[cnum]  [0]  [aline]  [leastx]  )  .task  =  proc; 

> 

> 

/*  INCREMENT  SWTICH  COUNTER  */ 
switchcnt  +=  count; 

/*  CHECK  FOR  CONTEXT  SWTICH  AND  PERFORM  */ 
if  (switchcnt  >=  switchnext) 

{ 

unsigned  long  intercnt; 
long  foot; 
int  sec; 

double  prob,prbcnt; 

/♦  COMPUTE  INTERRUPTION  INTERVAL  */ 


136 


iiLtercnt  =  (psatom->mimtasks-l)  ♦  compintO; 

/♦  APPLY  IMPACT  TO  EACH  CACHE  */ 

for  (cnuin=0;  cnuin  <  psatom~>iLiiincaclies;  cmim++) 

/*  APPLY  IMPACT  TO  EACH  SECTION  (INST/DATA)  */ 
for  (sec=0;  sec<=(psatom->para[ciLuia]  )  .type;  sec++) 

{ 

/*  COMPUTE  FOOTPRINT  FOR  EACH  SECTION  */ 
if  (sec==0) 

{ 

foot  =  if oot( iiLtercnt,  ((psatom-*>paraCcmiin] )  .bsizeCsec]  /  4)); 
if  ((psatom->paraCcm2in]  )  .  type  ==  0) 
foot  =  foot  +  df oot (intercnt) ; 

> 

else 

foot  =  df oot (intercnt ) ; 

/*  ITERATE  THROUGH  EACH  LINE  OVERWRITING  RANDOM  BLOCK(S)  ♦/ 
for  (aline=0;  aline  <  (psatom->para[cnnin] )  .csize[sec]  / 

((psatom->paraCcniiia]  )  .bsizeCsec]  * 
(psatoin->paLraCcnii[n]  )  .  assoc  [sec]  )  ;  aline++) 

{ 

/*  GENERATE  LINE'S  PROBABILITY  */ 

prob  =  (donble)random()/(pow(2.0,31.0)-1.0) ; 

/*  COMPUTE  PROBABILITY  OF  FIRST  OVERWRITE  ♦/ 
prbcnt  =  cal cprob (foot, 

(psatoin-’>paraCcnuin]  )  .  csize  [sec]  , 
(psatom->para[cnnin]  )  .bsize  [sec]  , 
(psatom“>paxa[cmim]  )  .  assoc  [sec]  , 

0); 

/*  ITERATE  UNTIL  ALL  OVERWRITTEN  OR  PROBABILITY  FAILS  ♦/ 
for  (liit=0;  ((hit  <  (psatom->para[cniim] )  .assoc[sec] )  && 

(prob  >  prbcnt));  hit++) 

■C 

/♦  COMPUTE  PROBABILITY  OF  NEXT  OVERWRITE  ♦/ 
if  (hit  <  ((psatom->p2Lra[cmim] )  .assoc [sec]  -  1)) 
prbcnt  +=  calcprob(foot , 

(psatom“>p2Lra[cnuin]  )  .  csize  [sec]  , 
(psatom->para[cn‘uin]  )  .bsize  [sec]  , 
(psatom->para[cn'uin] )  .assoc [sec]  , 
hit+1) ; 

/*  FIND  LRU  BLOCK  TO  EVICT  */ 
leastnsed  =  0; 

for  (x=0;  X  <  (psatom->para[cmiin]  )  .assoc [sec]  ;  x++) 

/*  UPDATE  'USE  BITS'  ♦/ 

(psatom“>data[cnnin]  [sec]  [aline]  [x])  .nse++; 

if  (  (psatoin->data[cnnm]  [sec]  [aline]  [x])  .nse  >=  leastnsed) 

leastnsed  =  (psatom->data[cnnin]  [sec]  [aline]  [x])  .nse; 
leastx  =  x; 

> 


137 


} 

/*  UPDATE  CACHE  DATA  ♦/ 

(psatoin->data[ciniin3  [sec]  [aline]  [leastx])  .use  = 
(psatom->data[cnuin]  [sec]  [aline]  [leastx])  .task  = 
(psatoin->numtasks  -  1) ; 

> 

} 

> 

} 

/*  RESET  FOR  NEXT  INTERVAL  ♦/ 
switchrec++; 
switchcnt  =  0; 
switchnext  =  compintO; 

} 


return; 

} 


0; 


/*  DATA  LOAD  ROUTINE  */ 

void  readrefdong  addr,  int  proc) 

{ 

int  index; 
int  X,  leastx; 
unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/*  PROCESS  REFERENCE  IN  EACH  CACHE  */ 

for  (cnxun=0;  cnuin<psatom->nuracaches;  cnum++) 

{ 

int  type  =  (psatom->paxa[cnum]  )  .t3rpe; 

int  assoc  =  (psatom“>para [cnum]  )  .assoc [t3rpe]  ; 

/♦  UPDATE  STATISTICS  */ 

( (psatom->stat [cnum] [proc] ) .readcnt)++ ; 

/♦  PARSE  ADDRESS  ♦/ 

aline  =  (addr  &  (psatom->para[cnum] )  .lmask[type] )  » 
(psatom->p2Lra[cnum]  )  .Ishift  [type]  ; 
atag  =  addr  »  (psatom->para [cnum]  )  .tshift  [type]  ; 

/♦  UPDATE  ^USE  BITS^  AND  CHECK  FOR  HIT  ♦/ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 

{ 

((psatom->data[cnum]  [t3rpe]  [aline]  [x])  .use)++; 
if  ( ((psatom“>data [cnum]  [type]  [aline]  [x])  .tag  ==  atag)  && 
((psatom->data[cnum]  [type]  [aline]  [x])  .task  ==  proc)) 

< 

(psatom->data[cnum] [type] [aline]  [x]) .use  =  0; 
hit  =  1; 

> 

> 

/*  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  */ 
if  (hit  ==  0) 


138 


/*  Firo  LRU  ♦/ 
leastused  =  0; 
for  (x=0;  x<assoc;  x++) 

{ 

if  (((psatoni~>data[c]nim]  Ct3rpe]  [aline]  [x]  )  .use  >=  leastused)  I  I 
((psatom“>data[cnum]  [type]  [aline]  [x]  )  .task  == 

psatom->nuintasks)  ) 

{ 

leastused  =  (psatom“>data[cnum]  [t3rpe]  [aline]  [x]  )  .use; 
leastx  =  x; 

} 

if  ((psatom->data[cnum]  [t3rpe]  [aline]  [x]  )  .task  == 

p  s  at  om- >numt  asks  ) 

X  =  assoc; 

} 

/♦  UPDATE  STATISTICS  ♦/ 

( (psatom->stat [cnum] [proc]) .readmisscnt)++; 

( (psatom->stat  [cnum] [proc] ) . interfere [ 

(psatom->data[cnum] [type] [aline] [leastx] ) .task] )++; 

/*  UPDATE  CACHE  DATA  ♦/ 

(psatom->data[cnum] [type] [aline]  [leastx]) .tag  =  atag; 
(psatom->data[cnum]  [t3rpe]  [aline]  [leastx])  .use  =  0; 
(psatom~>data[cnum]  [t3rpe]  [aline]  [leastx])  .task  =  proc; 

} 

> 

return; 

} 

/*  DATA  STORE  ROUTINE  */ 

void  writrefClong  addr,  int  proc) 

int  index; 
int  X,  leastx; 
unsigned  long  leastused; 
long  aline,  atag; 
int  cnum,  hit; 

/*  PROCESS  REFERENCE  IN  EACH  CACHE  */ 

for  (cnum=0;  cnum<psatom->numcaches;  cnum++) 

{ 

int  t3rpe  =  (psatom->paTa [cnum]  )  .type; 

int  assoc  =  (psatom~>para[cnum]  )  .assoc  [t3rpe]  ; 

/*  UPDATE  STATISTICS  ♦/ 

((psatom“>stat [cnum] [proc] ) .writcnt)++; 

/*  PARSE  ADDRESS  ♦/ 

aline  =  (addr  &  (psatom->para [cnum] ) .Imask [type] )  » 
(psatom->para[cnum]  )  .Ishift  [t3rpe]  ; 
atag  =  addr  »  (psatom->para [cnum] ) .tshift [type] ; 

/*  UPDATE  ^USE  BITS^  AND  CHECK  FOR  HIT  */ 
hit  =  0; 

for  (x=0;  x<assoc;  x++) 


139 


((psatom->data[ciniin]  [type]  [aline]  [x]  )  .iise)++; 
if  (((psatoia->dataCcmiin]  [t3rpe]  [aline]  [x])  .tag  ==  atag)  && 
((psatora-'>data[cniiin]  [type]  [aline]  [x])  .task  ==  proc)) 

{ 

(psatom->data[cmim]  [type]  [aline]  [x]  )  .use  =  0; 
hit  =  1; 

} 

> 

/♦  IF  NO  HIT,  FIND  LRU  BLOCK  TO  EVICT  ♦/ 
if  (hit  ==  0) 

{ 

/♦  FIND  LRU  BLOCK  ♦/ 

leastnsed  =  0; 

for  (x=0;  x<assoc;  x++) 

{ 

if  (((psatom->data[cniiin]  [type]  [aline]  [x])  .use  >=  leastnsed)  11 
( (psatom->data[cnum]  [t3rpe]  [aline]  [x]  )  .task  == 

psatom->nnmtasks) ) 

{ 

leastnsed  =  (psatom~>data[cnnm]  [t3rpe]  [aline]  [x])  .use; 
leastx  =  x; 

> 

if  ((psatom-‘>data[cnnin]  [type]  [aline]  [x]  )  .task  == 

p  s  at  om->nnmt  asks  ) 


X  =  assoc; 

> 

/♦  UPDATE  STATISTICS  */ 

(  (psatoin'->stat  [cnnm]  [proc]  )  .  writiaisscnt)++ ; 

(  (psatom->stat  [cnnm]  [proc]  )  .  interfere  [ 
(psatom->data[cnnm]  [t3rpe]  [aline]  [leastx]  )  .task]  )++; 

/♦  UPDATE  ♦/ 

(psatom->data [cnnm] [type]  [aline] [leastx] ) .tag  =  atag; 
(ps at om~>dat a [cnnm] [type]  [aline] [leastx] ) .use  =  0; 
(psatom->data[cnum] [type] [aline]  [leastx]) .task  =  proc; 
> 


return; 

} 


/*  STORE  RESULTS  ROUTINE  */ 
void  printresCint  proc) 

{ 

int  c,x,y; 
stats  total; 

FILE*  file; 

file  =  f open (*' cache. out’*,*' a** )  ; 

f pr int f  (file, ’’DATA  AT  END  OF  PROCESS  y,d\n"  ,proc)  ; 

f printf (file , ”<><><><><><><><><><><><><><><><><><><><><><><><><>\n*' ) ; 
for  (c=0;  c<psatom->nnmcaches;  C++) 

/*  PRINT  CACHE  DATA  */ 


( 


140 


fprintf (file, *'simulat ion:  y,s 


(data  at  end  of  process  y,d)\n*', 
psatom~>name  [0] ,proc) ; 
fprintf  (file, "total  context  switches  modeled:  5flii\n"  ,switchrec)  ; 

fprintf  (f ile, " - \n'')  ; 

fprintf  (file, "CACHE  #  y.d\n",  c)  ; 

fprintf  (file,  "cache  t3rpe:  Xd  (0=iiiiif  ied,  l=split)\n", 

(psatom->para[c] ) .type) ; 

fprintf  (file,  "icache  size:  y,d\n" ,  (psatom->para[c]  )  .csize[03); 
fprintf (file, "icache  line  size:  %d\n" , (psatom“>para[c]  ) .bsize [0]  ) ; 
fprintf  (file,  "icache  associativity:  y,d\n", 

(psatom->para [c]  )  .  assoc  [0]  )  ; 

if  ((psatom~>para[c]  )  .  t3rpe  ==  1) 

{ 

fprintf  (file ,  "dcache  size:  y.d\n",  (psatom->paxa[c]  )  .csizeCl]  ); 
fprintf  (file,  "dcache  line  size:  y,d\n" ,  (psatom->paraCc]  )  .bsize  [1]  )  ; 
fprintf (file, "dcache  associativity:  yd\n", 

(psatom->paraCc] ) . assoc [1]  )  ; 

} 

total. instcnt  =  0; 
total .readcnt  =  0; 
total. writcnt  =  0; 
total . ins t mis sent  =  0; 
total.readmisscnt  =  0; 
total . wr it mis sent  =  0; 

/♦  PRINT  PROCESS  CACHE  PERFORMANCE  */ 
for  (y=0;  y  <  psatom->niimtasks;  y++) 

{ 

int  z; 

total . instcnt  =  total . instcnt  +  (psatom->stat [c] [y] ) . instcnt; 
total  .readcnt  =  total. readcnt  +  (psatom-'>statCc]  [y]  )  .readcnt ; 
total. writent  =  total. writent  +  (psatom->stat [c] Cy] ) .writent ; 
total. ins t mis sent  =  total . ins tmis sent  + 

(psatom->stat [c] [y] ) . instmissent ; 
total.readmisscnt  =  total.readmisscnt  + 

(psatom->stat [c] [y] ) .readmissent ; 
total.wri tmis sent  =  total. writ mis sent  + 

(psatom->stat [c] [y] ) . writmissent ; 
fprintf  (file , "  ♦**ic+***+*\n*' )  j 

fprintf  (file , "  Process  #y,d\n" ,  y)  ; 

fprintf  (file , "  Inst  y,121ii  " ,  (psatom->stat  [c]  [y]  )  .  instcnt)  ; 

fprintf  (file ,  "Miss  y,121u  " ,  (psatom->stat  [c3  Cy]  )  .  instmissent)  ; 
if  (  (psatom->stat  [c]  [y]  )  .  instcnt  !  =  0) 
fprintf  (file,  "Perc  y,.61f",  100.0* 

(psatom->stat  [c]  [y]  )  .  instmissent  / 
(psatom->stat  [c]  [y]  )  .  instcnt)  ; 
fprintf  (file,  "\n  Data  y,121n  ", 

(psatom->stat  [c]  [y]  )  .readcnt  + 
(psatom->stat  [c]  Cy]  )  .writent)  ; 

fprintf  (file,  "Miss  y,121n  ",  (psat  om->st  at  [c]  [y]  )  .readmissent  + 

(psatom->stat  [c]  Cy]  )  .writmissent)  ; 
if  ( ( (psatom“>stat  Cc] Cy] ) . readcnt  + 


141 


(psatoin->stat  [c]  [y]  )  .  writcnt)  !  =  0) 
fprintf (file, "P ere  >(.61f'*,  100.0  * 

( (psatom->stat [c] [y] ) .readmissent  + 
(psatom->stat  [c]  [y] ) . writmissent )  / 
( (psatom->stat  [c]  [y] ) . readent  + 
(psatoin->stat  [c]  [y] ) .  writent )  ) ; 
fprintf  (file, ’*\n  read  y,121u  **, 

(psatom->stat  [c]  [y]  )  .readent)  ; 

fprintf  (file, ’’Miss  y,121u  (psatom->stat  [c]  [y]  )  .readmissent)  ; 
if  ( (psatom->stat [e] Cy] ) .readent  ! =  0) 
fprintf  (file,  "Pere  y..61f",  100.0  * 

(psatom“'>stat  [e]  [y]  )  .readmissent  / 
(psatom->stat  [e]  [y]  )  .readent)  ; 
fprintf  (file,  **\n  writ  y.l21ii 

(psatom-">stat  [e]  [y]  )  . writent)  ; 

fprintf  (file,  **Miss  y,12lTi  ”,  (psatom->stat  [e]  [y]  )  .writmissent) ; 
if  ( (psatom->stat [e] [y] ) . writent  ! =  0) 
fprintf(file,”Pere  y..61f”,  100.0  * 

(psatom->stat [e] [y] ) .writmissent  / 
(psatom-'>stat  [e]  Cy]  )  .writent)  ; 
fprintf(file,”\n  TOTAL  y,121u  ”, 

(psatom->stat [e] Cy] ) . instent  + 
(psatom^>stat  Cc] Cy] ) .readent  + 
(psatom->stat  Ce] Cy] ) .writent) ; 

fprintf  (file, '*Miss  y,121n  ”,  (psatom~>stat  Cc]  Cy]  )  .  instmissent  + 

(psatom“>stat  Ce] Cy] ) .readmissent  + 
(psatom->stat  Cc] Cy] ) .writmissent) ; 
if  ( ( (psatom~>stat  Cc] Cy] ) . instent  + 

(psatom->stat  Ce] Cy] ) .readent  + 

(psatom->statCe] Cy] ) .writent)  !=  0) 
fprintf (file, "Pere  y.61f”,  100.0  * 

( (psatom-“>stat  Cc]  Cy]  )  .  instmissent  + 
(psatom->stat  Cc] Cy] ) .readmissent  + 
(psatom->stat Ce] Cy] ) .writmissent)  / 
( (psatom->stat  Ce] Cy] ) . instent  + 
(psatom->statCe] Cy]) .readent  + 
(psatom->stat  Cc] Cy] ) .writent)) ; 

fprintf  (file,  ”\n  Int  (times  proeess  y,d  overwrote :  )\n”,  y)  ; 

for  (z=0;  z  <=  psatom-'>nnmtasks;  z++) 

fprintf  (file,”  Proeess  Xd  =  y,121n\n”,  z, 

(psatom->stat  Cc] Cy] ) . int erf ere Cz] ) ; 
fprintf (file,”  (process  %d  is  invalid  data)\n”, 

psatom->mimtas}cs) ; 

> 

/*  PRINT  TOTAL  CACHE  PERFORMANCE  */ 

fprintf  (file  ,  ”  +  • 

fprintf (file,”  TOTAL  FOR  CACHE\n”); 

fprintf  (file,”  Inst  y,121u  ”,  total,  instent)  ; 

fprintf (file, "Miss  yi21ii  ”,  total. instmissent) ; 

if  (total . instent  1=  0) 

fprintf (file, "Pere  y.61f”,  100.0  ♦  total. instmissent  / 


142 


total . ins tent) ; 

fprintf  (f ile,*'\n  Data  Vtl21u  ,  total .readent  + 

total .writ ent) ; 

fprintf  (file, ’’Miss  y,121ii  ",  total.readmisscnt  +  total.writmisscnt) ; 
if  ((total. readent  +  total .writ ent)  !=  0) 
fprintf(file,"Pere  y..61f",  100.0  ♦ 

(total.readmissent  +  total .writ mis sent)/ 
(total .readent  +  t otal. writ ent) ) ; 
fprintf  (file,  "\n  read  y,121u  ",  total.readent)  ; 

fprintf  (file,  "Miss  y.l21u  ",  total.readmissent); 
if  (total.readent  !=  0) 

fprintf  (file,  "Pere  y,,61f",  100.0  *  total.readmissent  / 

total.readent) ; 

fprintf  (file,  "\n  writ  y,121u  ",  t  otal.  writ  ent )  ; 

fprintf  (file,  "Miss  y,121ii  ",  total.writmissent) ; 
if  (total .writ ent  !=  0) 

fprintf  (file,  "Pere  y,.61f",  100.0  *  total.writmissent  / 

total. writ ent) ; 

fprintf  (file,  "\n  TOTAL  y,121u  ",  total,  instent  + 

total.readent  + 
total. writ ent) ; 

fprintf  (file,  "Miss  y,121n  ",  total,  instmissent  + 

total.readmissent  + 
total . writmissent) ; 

if  ((total. ins tent  +  total.readent  +  total. writ ent)  !=  0) 
fprintf  (file,  "Pere  y,.61f",  100.0  *  (total .  instmissent  + 

total.readmissent  + 
total.writmissent)  / 

(total . instent  + 
total.readent  + 
total . writ  ent ) ) ; 

fprintf (file , "\n" ) ; 
fprintf (file , "\f ") ; 

> 

f elose(f ile) ; 

/+  IF  LAST  PROCESS,  SHUT  DOWN  SIMULATION  */ 
psatom->eonnt — ; 
if  (psatom”>eoimt  ==  0) 

•[ 

psatom~>mimeae]ies  =  0; 
psatom->mimtasks  =  0; 

> 

return; 

} 


143 


B  Tables  of  Simulation  Results 

Key  to  data  tables: 

Miss  Data 

•  Inst  =  instruction  fetch  misses 

•  Read  =  data  read  misses 

•  Write  =  data  write  misses 

•  Data  =  total  data  read  and  write  misses 

•  Total  =  total  misses 

•  %  =  miss  rate 

Interference  Data  (Int(95^)) 

•  Process  0  is  the  kernel,  except  for  simulations  with  the  context  switch  model  where  process  0 
is  the  test  program. 

•  Additional  process’  numbers  are  shown  in  the  same  order  as  the  tables. 

•  The  extra  process  is  for  cases  where  invalid  data  is  overwritten  (at  simulation  start). 

B.l  Compress  Alone 


Compress  data: 

Table  6 

B.2 

GCC  Alone 

GCC  data: 

Table  7 

B.3 

Espresso  Alone 

Espresso  data: 

Table  8 

B.4 

Alvinn  Alone 

Alvinn  data: 

Table  9 

B.5 

Compress  w/  Operating  System 

Compress  data: 

Operating  System  data: 

Combined  data: 

Table  10 
Table  11 
Table  12 

B.6 

GCC  w/  Operating  System 

GCC  data: 

Operating  System  data: 

Combined  data: 

Table  13 
Table  14 
Table  15 

144 


B.7 


B.8 


B.9 


B.IO 

B.ll 

B.12 

B.13 

B.14 

B.15 

B.16 

B.17 


Espresso  w/  Operating  System 


Espresso  data.;  Table  16 

Operating  System  data:  Table  17 

Combined  data;  Table  18 

Alvinn  w/  Operating  System 

Alvinn  data:  Table  19 

Operating  System  data:  Table  20 

Combined  data:  Table  21 


Compress  and  GCC  w/  Operating  System 

Compress  data:  Table  22 

GCC  data:  Table  23 

Operating  System  data:  Table  24 

Combined  data:  Table  25 

Compress  and  Espresso  w/  Operating  System 

Compress  data:  Table  26 

Espresso  data:  Table  27 

Operating  System  data:  Table  28 

Combined  data:  Table  29 

GCC  and  Espresso  w/  Operating  System 

GCC  data:  Table  30 

Espresso  data:  Table  31 

Operating  System  data:  Table  32 

Combined  data:  Table  33 

Compress  w/  Model,  n=l 

Compress  data:  Table  34 

GCC  w/  Model,  n=l 

GCC  data:  Table  35 

Espresso  w/  Model,  n=l 

Espresso  data:  Table  36 

Alvinn  w/  Model,  n=l 

Alvinn  data:  Table  37 

Compress  w/  Model,  n=2 

Compress  data;  Table  38 

GCC  w/  Model,  n=2 

GCC  data:  Table  39 


145 


Table  40 


B.18  Espresso  w/  Model,  n=2 

Espresso  data: 


146 


Table  6:  Compress  Alone 


157  0.0002  327671T  14.6203  18121  0.2126  3294634  10.6513  3294991  2.7928  3294578 

218  0.0003  3642992  16.2546  80614  0.9460 _ 3723606  12.0374  3723824  3.1563  3723606 

96  0.0001  3431770!  15.3122  23850  0.2799  3455620  11.1711  3455716  2.9291  3455496 

96|  O.OOOf  3376695:  15.0664  13679  0.1605  3390374  10.9601  3390470  2.8738  3390247 


Table  7:  GCC  Alone 


§ 

c 

128 

CD 

LO 

CM 

CM 

CM 

lO 

9S2 

CD 

in 

CM 

CD 

in 

CM 

CO 

CM 

CO 

CM 

CO 

CM 

64 

s 

CM 

in 

CM 

CM 

in 

CD 

in 

CM 

CD 

in 

CM 

CD 

in 

CM 

CO 

CM 

00 

CM 

921- 

1024 

1024 

1024 

CM 

in 

CM 

in 

CM 

in 

CD 

in 

CM 

CD 

in 

cm 

CD 

in 

CM 

L 

1024 

1024 

1024 

512 

CM 

m 

CM 

in 

CD 

m 

CM 

CD 

m 

CM 

CD 

m 

CM 

o 

o 

5 

CO 

ID 

CD 

O 

6387421 

\D 

O 

S 

CO 

CO 

1362069 

14871198 

12240343 

11388996 

h- 

Tj- 

rv 

CO 

in 

s 

10591406 

9640984 

13042331 

o 

Oi 

5 

h- 

Oi 

8504324 

9617432 

7427231 

6539509 

8782009 

6540376 

5682266 

8461616 

5984311 

5221556 

5867223 

3829154 

3048018 

cn 

o 

00 

CD 

in 

3305486 

rv 

in 

CO 

CM 

tD 

CO 

CM 

4847811 

3102243 

2506672 

2741347 

1560811 

CM 

cn 

m 

o 

o 

2578590 

1376868 

983267 

2698274 

1332131 

CD 

rv 

IV 

CD 

-CJ- 

03 

4.6440 

CM 

CO 

00 

h- 

CM 

1.4399 

0.5937 

6.4796 

5.3333 

4.9624 

5.8641 

4.6148 

4.2007 

5.6827 

4.2621 

3.7054 

CD 

O 

cn 

3.2363 

in 

cn 

3 

cm’ 

in 

CD 

CM 

CO 

CO 

CO 

cn 

s 

CM 

2.4759 

CO 

CD 

00 

CD 

CO 

2.6075 

2.2751 

2.5568 

1.6686 

1.3285 

2.2521 

1.4404 

1.1561 

CO 

CM 

CM 

00 

55 

p 

CO 

CM 

cn 

o 

1.1949 

0.6805 

o 

o 

CO 

d 

1.1237 

0.6001 

CD 

00 

CM 

d 

00 

m 

rv 

0.5805 

CD 

CM 

d 

Total 

10658638 

6387677 

CM 

CO 

s 

CO 

CO 

CO 

ID 

CM 

<0 

CO 

S 

CO 

O) 

Oi 

in 

o 

CM 

CM 

in 

CM 

Oi 

CO 

CO 

in 

CO 

CO 

in 

§ 

s 

in 

5 

in 

o 

CM 

s 

o> 

13042395 

in 

Oi 

Oi 

g 

c» 

00 

CO 

CO 

CO 

9617944 

7427743 

6540021 

8782265 

CM 

CO 

CD 

O 

s 

CO 

CM 

CM 

in 

Si 

CO 

CD 

in 

s 

z 

<33 

CO 

I 

cn 

in 

5221684 

N. 

Tf 

CM 

00 

CD 

CO 

in 

3830178 

3049042 

CO 

<X3 

CO 

CD 

in 

3305998 

cn 

CD 

CO 

CO 

in 

CD 

CM 

4848067 

cn 

03 

o 

CO 

2506928 

rv. 

CO 

CM 

CM 

1561835 

1101616 

CM 

O 

S 

rv 

m 

CM 

1377380 

cn 

fv 

rv 

CO 

CO 

cn 

Id 

CO 

m 

CO 

cn 

CD 

CM 

1332387 

CM 

CO 

O 

rv 

7.2523 

S 

TT 

2.1541 

CO 

w 

cn 

d 

9.9886 

CO 

S 

CD 

d 

CO 

o 

in 

CD 

in 

Oi 

d 

6.8805 

5.6809 

CM 

g 

Oi 

I^ 

nJ 

CJ) 

rf 

O 

CD* 

o 

CM 

CO 

in 

in 

Oi 

CO 

Oi 

CM 

CO 

6.7390 

CD 

CO 

O 

p 

CD 

s 

o 

1 

CO 

03 

in 

in 

cn 

CD 

CM 

o 

CO 

CM 

CO 

CM 

in 

in 

in 

CO 

2.1565 

1.7628 

CO 

CM 

in 

in 

CO 

CO 

rv. 

Oi 

CO 

oo 

o 

p 

s 

fv. 

CO 

CO 

2.0676 

1.4588 

O 

z 

a 

cm' 

o 

CM 

rv 

CO 

d 

CO 

CO 

m 

CD 

d 

00 

CM 

z 

d 

CD 

CM 

cn 

m 

d 

O 

CD 

CM 

0.9649 

0.6108 

Data 

5023847 

2920095 

o> 

CO 

w 

cn 

o> 

o 

s 

CD 

6919319 

CO 

r«- 

o 

CO 

CD 

■M- 

o 

CJ> 

o 

tt 

O) 

CO 

7478711 

4766268 

3935303 

8554933 

CM 

Tf 

s 

Oi 

CO 

in 

CM 

O 

cn 

CM 

CM 

TJ- 

4350955 

2753904 

03 

r«- 

1^ 

CD 

CD 

w 

4668249 

CO 

CO 

CO 

h«. 

2081375 

cn 

in 

CD 

CO 

CM 

in 

cn 

CM 

CD 

cn 

CM 

2267280 

in 

CO 

g 

1493860 

1221136 

2461074 

CO 

o 

o 

CO 

CD 

o 

rv. 

rv. 

55 

o 

2684583 

CO 

tv. 

CM 

CM 

CO 

CO 

S 

o 

o 

CO 

cn 

cn 

CO 

o 

z 

o 

o 

CD 

CD 

w 

CD 

m 

•M- 

m 

i¥ 

it 

CO 

m 

in 

o 

TT 

d 

CM 

CD 

O 

CO 

668392 

|V 

CO 

CM 

5.8936 

CD 

O 

CM 

CO 

1.5329 

CO 

O 

O) 

'«• 

d 

9.7017 

CO 

in 

CO 

m 

CD 

in 

in 

CD 

in 

8.8529 

5.2722 

00 

g 

in 

8.7641 

4.7876 

3.9095 

6.2464 

4.0759 

3.3474 

5.6828 

3.1012 

2.3274 

5.3738 

CM 

CM 

cn 

CD 

CM 

1.8366 

3.2318 

2.1585 

1.9415 

CO 

CO 

p 

CM 

CM 

p 

CM 

55 

CM 

2.4827 

55 

o 

"CD^ 

CD 

cn 

oo 

1 

1.5696 

CD 

in 

CD 

d 

0.4312 

00 

m 

CD 

IV 

CO 

CO 

d 

m 

m 

o 

d 

1.6331 

0.4329 

O 

CM 

d 

Write 

1124204 

625965 

CD 

O 

i 

CM 

CD 

CM 

lO 

CO 

Oi 

is 

in 

o 

m 

CO 

1246657 

CO 

Oi 

in 

CM 

CO 

o 

CM 

h- 

CD 

CO 

CO 

CD 

1005669 

858898 

o 

f^ 

CD 

913221 

o 

CO 

h- 

in 

Tj- 

h- 

1191492 

in 

r^ 

1^ 

o 

CM 

in 

00 

CO 

CD 

1083978 

591549 

CM 

in 

cn 

CO 

? 

1025053 

cn 

CM 

in 

CO 

in 

350334 

616452 

CD 

CO 

CO 

CO 

CO 

o 

tv. 

CO 

511914 

290744 

CD 

CO 

Oi 

m 

CO 

CM 

473562 

CD 

|v. 

CO 

cn 

CM 

CM 

171024 

299395 

CD 

CM 

•M- 

|v 

1 

82256 

m 

o 

CD 

cn 

rv 

CM 

92268 

CO 

rv 

o 

CD 

CO 

in 

o 

fv 

m 

CM 

00 

51510 

1602401411 

501973291 

19074844! 

692721731 

2295123141 

- 

7.7686 

4.5702 

2.3901 

1.1367 

10.0976 

6.7416 

5.6407 

11.5346 

7.4916 

6.1286 

o 

CM 

CO 

8.9312 

CM 

cn 

CO 

Oi 

CD 

6.2941 

3.9373 

3.0445 

7.1404 

4.3465 

3.2620 

8.3898 

4.8791 

3.8188 

3.6781 

2.1557 

cn 

s 

p 

3.8830 

1077259'  2.1460 

809191  1.6120 

2211021;  4.4047 

2.3953 

1.6724 

11096031  2.2105 

486658  0.9695 

373960  0.7450 

1234739  !  2.4598 

CD 

CD 

00 

cn 

d 

CM 

CM 

CM 

m 

cn 

350341  0.6979 

1494708  2.9777 

5858211  1,1670 

371607;  0.7403 

Read 

3899643 

2294130 

1199783 

570583 

5068732 

3384080 

2831492 

5790039 

3760599 

3076405 

CO 

Oi 

o 

CO 

CO 

CD 

4483221 

Oi 

CM 

CO 

CO 

CO 

CD 

S 

in 

TO 

1976429 

1528259 

3584271 

2181824 

1637423 

4211466 

2449200 

1916946 

1846283 

1082124 

850798 

1949160 

1202397 

839524 

Reference  Statistics: 

(/) 

<D 

U 

c. 

Q> 

3.5165 

2.1640 

1.1312 

0.4359 

4.9626 

4.7490 

4.6650 

3.7320 

3.6353 

3.5608 

2.8005 

2.7369 

2.6681 

3.2869 

2.9168 

2.7292 

2.5674 

2.3510 

2.2473 

2.0127 

1.8857 

1.8437 

2.1253 

1.4580 

1.1407 

1.6898 

1.2094 

CD 

CO 

o 

o 

1 .3502 

CO 

CM 

O 

0.9338 

0.8321 

rvT 

03 

ml 

o| 

0.4028 

0.6645 

0.4929 

CO^ 

IV 

m 

CO 

d 

0.5569 

0.4144 

0.3270 

DC 

c 

o 

o 

3 

00 

c 

o 

Data  Reads 

Data  writes 

Total  Data  References 

Total  References 

0> 

o 

w 
c5  _ 

Inst 

5634791 

3467582 

1812638 

698472 

7952135 

7609862 

7475162 

5980164 

5825266 

5705609 

4487462 

4385553 

4275364 

5266989 

4673839 

4373242 

4114016 

3767259 

3601147 

3225225 

3021710 

2954404 

3405512 

2336318 

1827906 

2707757 

1937995 

1608192 

z 

z 

CD 

CM 

i 

1670226 

o 

CO 

CO 

CD 

03 

Tj- 

’‘i 

t 

1333373 

957751 

645400 

1064758 

06968Z 

m 

CO 

CM 

CO 

rv 

m 

cn 

o 

CO 

CM 

cn 

00 

663995 

523915 

W  ' 

(A 

CO 

IT 

Cache 

o 

=T 

i 

CM 

CO 

in 

CD 

CO 

Cl> 

o 

CM 

in 

CD 

CO 

cn 

o 

CM 

CM 

CM 

CM 

CO 

CM 

'M- 

CM 

m 

CM 

CD 

CM 

-  27 

dT 

cm! 

cn, 

cm| 

0|  1- 
co;co 

CM 

CO 

CO 

CO 

z 

m' 

CO 

36 

io 

CO 

CO 

cn 

CO 

148 


Table  8:  Espresso  Alone 


! 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

CD 

in 

CM 

CM 

s 

CM 

U) 

CD 

in 

CM 

CD 

in 

CM 

CD 

in 

CM 

00 

CM 

CO 

CM 

CO 

CM 

s 

s 

s 

CM 

U) 

CM 

5 

CM 

in 

CD 

in 

CM 

CO 

in 

CM 

CO 

in 

CM 

s 

CO 

CM 

00 

CM 

o 

CM 

o 

1 

s 

o 

in 

CM 

10 

CM 

10 

CD 

tn 

CM 

CD 

to 

CM 

CD 

in 

CM 

r^ 

o 

o 

CM 

O 

o 

CD 

o 

in 

CM 

OT 

CM 

OT 

OT 

OT 

CM 

OT 

OT 

CM 

OT 

OT 

CM 

1 

00 

CO 

S 

CM 

TJ- 

CM 

in 

CM 

CD 

CO 

N- 

CD 

in 

in 

CD 

00 

CO 

o 

w 

s 

o 

00 

CM 

CO 

in 

in 

in 

CO 

o 

a> 

S 

CM 

N. 

CO 

CM 

tn 

CD 

00 

CM 

CM 

CM 

CO 

CD 

in 

CD 

CD 

00 

CM 

00 

CM 

<D 

CO 

£ 

5 

CM 

CD 

CM 

CM 

CO 

m 

5 

o 

CD 

CD 

O 

§ 

s 

r^ 

CM 

CM 

<D 

CD 

Tf 

CO 

h- 

CM 

s 

CD 

CM 

S 

O) 

CD 

CO 

CM 

CO 

CM 

CO 

00 

CD 

CO 

rs. 

in 

00 

§ 

in 

CD 

CO 

Si 

O) 

CO 

CD 

s 

s 

CO 

CO 

CM 

in 

CM 

CO 

CO 

CD 

O 

CO 

in 

CM 

in 

CD 

m 

f'. 

CO 

CM 

z 

r«. 

m 

00 

m 

CM 

o 

in 

CO 

CM 

•M- 

1^ 

5 

s 

CD 

CO 

•M* 

in 

CM 

« 

fe 

fo 

OT 

CD 

S 

CD 

CD 

in 

OT 

CO 

CM 

OT 

S’ 

& 

in 

OT 

OT 

■M- 

OT 

CM 

CD 

s 

•M- 

OT 

o 

CO 

■M* 

'cr 

O 

OT 

OT 

OT 

0.0125  5096588  2.2573  644486  1.0765  5741074  2.0099  5863406  0.4641  5863150 

0.0053  1849075  0.8190  250909  0.4191  2099984  0.7352  2151631  0.1703  2151375 

0.0022  348444  0.1543  94040  0.1571  442484  0.1549  463677  0.0367  463421 

CD 

CM 

O 

in 

CO 

oo 

CD 

d 

in 

CD 

TT 

d 

o 

TT 

CD 

o 

d 

CO 

o 

CO 

CO 

in 

CM 

CM 

CM 

CM 

s 

o 

CD 

in 

CD 

s 

oi 

OO 

CD 

o 

q 

tn 

CO 

o 

in 

m 

CD 

<D 

o 

CO 

J 

O 

O) 

s 

o 

to 

o 

00 

CM 

s 

00 

CD 

CJ 

CD 

d 

O 

CD 

CD 

**> 

CM 

CM 

O 

CD 

d 

8 

S 

d 

o 

CO 

a> 

CM 

CO 

CM 

CO 

d 

CO 

h- 

CD 

in 

d 

m 

O) 

CO 

q 

00 

in 

■M- 

d 

CM 

d 

o 

05 

CO 

d 

'di 

CM 

CO 

d 

05 

d 

05 

d 

CD 

O 

O 

CO 

d 

CO 

in 

in 

d 

■M- 

i 

d 

in 

m 

d 

in 

CO 

o 

d 

CD 

OT 

OT 

O 

1^ 

'M' 

d 

OT 

O 

d 

75 

CM 

O 

S 

S 

CM 

1 

1 

h- 

o 

o 

in 

CD 

CM 

r^ 

co 

CM 

in 

CD 

CO 

CM 

CM 

o 

o> 

CD 

00 

CD 

s 

CM 

CM 

S 

CO 

CM 

CM 

CO 

CM 

CD 

CM 

CM 

CM 

in 

CO 

s 

l>s. 

CM 

TJ- 

o 

CM 

m 

o 

in 

in 

CO 

CM 

CO 

5 

in 

00 

CO 

CM 

CD 

CD 

CO 

CM 

CD 

O 

8 

CO 

h- 

CM 

CO 

CO 

CD 

CM 

5 

CO 

CO 

Si 

m 

s 

1 

CD 

CM 

h«. 

CO 

r- 

rT 

CO 

z 

CO 

CO 

CO 

CO 

CO 

CO 

in 

CM 

in 

CO 

oo 

§ 

5 

00 

CD 

CO 

N. 

in 

CM 

CM 

CD 

CO 

CO 

rr 

CD 

CM 

CO 

in 

o 

CM 

CD 

CD 

tn 

00 

CM 

CD 

& 

S; 

in 

CM 

5 

CD 

OT 

CM 

o 

§ 

OT 

in 

S 

CD 

in 

r-- 

CO 

d 

CD 

!5 

s 

■M- 

CD 

OT 

OT 

■M* 

■M- 

CD 

CM 

OT 

OT 

CD 

OT 

S 

CM 

CD 

CO 

CD 

s 

CM 

fe 

w 

00 

CM 

d 

■M" 

CD 

O 

o> 

in 

CD 

in 

CM 

o 

in 

h- 

CD 

00 

CO 

CD* 

o 

CM 

in 

CO 

CM 

<£> 

rt 

CO 

CM 

d 

s 

CO 

CD 

in 

CO 

o 

in 

CO 

o 

CM 

CD 

o 

CD 

CD 

h- 

CO 

o 

CO 

h- 

CD 

CD 

00 

in 

OO 

CD 

CO 

CD 

CM 

in 

O) 

in 

CO 

CM* 

CO 

s 

TT 

CD 

fe 

CO 

CM* 

CD 

CO 

CO 

o 

CM 

CD 

CD 

CD 

CO 

s 

U) 

s 

T- 

CD 

CD 

o 

CM 

CO 

s 

in 

00 

h- 

h. 

d 

h- 

o 

CM 

in 

CO 

CD 

CM 

OT 

OT 

CM 

CD 

d 

o 

OT 

'IT 

OT 

in 

GO 

d 

O 

to 

d 

CM 

OT 

s 

CM 

CD 

OT 

'T 

d 

OT 

d 

CO 

CD 

CM 

CD 

o 

h- 

s 

in 

CO 

CO 

CM 

CO 

in 

CD 

o 

CD 

o 

CO 

00 

CD 

CD 

s 

8 

CM 

CM 

in 

CD 

CD 

CM 

CO 

CO 

00 

00 

o 

CD 

CD 

CM 

5 

CO 

CM 

h- 

m 

CM 

CD 

CO 

CO 

CD 

a 

a 

r>- 

CD 

hv 

« 

CD 

O 

CO 

S 

CD 

tn 

CM 

in 

h- 

o 

o 

CO 

s 

CO 

CD 

in 

CM 

in 

00 

00 

CD 

CD 

00 

O 

CM 

in 

fe 

in 

0- 

CD 

CO 

5 

CO 

s 

CO 

oo 

CO 

00 

h- 

s: 

n 

h- 

CD 

Tt 

00 

CD 

s 

I 

CM 

z 

CM 

!>• 

in 

h- 

CD 

s 

oo 

in 

r'. 

CD 

h- 

z 

CD 

CD 

CD 

S 

CO 

CD 

CM 

CD 

00 

CO 

CO 

CM 

S 

CD 

Q) 

o 

CM 

in 

CD 

in 

CO 

o 

CD 

CO 

CO 

CM 

CM 

CM 

00 

CD 

CO 

CD 

to 

o 

o 

OT 

OT 

CO 

CM 

OT 

OT 

h- 

O 

05 

OT 

Tf 

CD 

h- 

CM 

CD 

CO 

OT 

OT 

in 

OT 

OT 

■M- 

OT 

CO 

•M- 

•M- 

OT 

h- 

OT 

OT 

OT 

•'T 

OT 

O 

OT 

CO 

s 

CD 

CO 

in 

o 

CD 

Tl- 

CM* 

CD 

CO 

CD 

$ 

SI 

d 

5 

N. 

in 

CD 

CO 

OO 

CD 

S 

CO 

s 

in 

CD 

1^ 

CO 

in 

in 

CO* 

i 

CD 

in 

o 

r««. 

CD 

CO 

CO 

CM 

s 

O 

CO* 

CM 

o 

Tf 

■M- 

CO 

CO 

CM 

CO 

00 

h- 

CD 

cvi 

CO 

CO 

CD 

in 

CO 

CD 

CO 

CM* 

CD 

CM 

CM 

to 

CM 

CO 

<0 

o 

CO 

o 

CM* 

z 

CO 

■M: 

CO 

in 

CD 

CM 

00 

00 

CD 

to 

W 

S 

tn 

CD 

o 

CO 

CM 

CM 

S> 

CM 

CD 

CD 

■M- 

05 

d 

in 

CO 

1^ 

cvi 

CM 

q 

m 

m 

CD 

d 

OT 

CD 

q 

CM 

GO 

s 

d 

CM 

d 

z 

OT 

OT 

d 

h- 

OT 

OT 

d 

8 

OT 

d 

o 

o 

CM 

in 

s 

in 

CM 

S 

in 

CM 

CM 

S 

O) 

CM 

r^ 

o 

CD 

CM 

CD 

O 

tn 

5 

CD 

o 

CO 

in 

CD 

CD 

in 

o 

00 

s 

CM 

5 

00 

o 

CO 

o> 

o 

CO 

CD 

S 

CM 

CD 

CO 

in 

5 

m 

CO 

S 

00 

CD 

CO 

s 

h- 

CO 

CO 

CD 

CO 

CM 

CD 

CD 

CO 

CD 

CM 

CO 

CO 

05 

CD 

CD 

CM 

h- 

h- 

CM 

CD 

CO 

OO 

CO 

in 

r- 

CM 

CM 

o 

CO 

CD 

h*. 

CD 

■M- 

5 

o 

CO 

CD 

CO 

OT 

CO 

■M- 

CO 

00 

o 

CD 

h- 

OT 

oo 

CM 

OT 

CD 

h- 

CM 

CM 

CO 

t". 

GO 

OT 

OT 

CM 

CM 

CD 

CM 

00 

CD 

OT 

OT 

s 

00 

in 

CO 

CO 

OO 

CD 

CM 

CO 

CM 

■M- 

in 

CM 

CD 

CM 

d 

CO 

CD 

CO 

CM 

d 

CD 

CM 

OO 

O 

1^ 

CO 

oo 

in 

o 

CD 

CO 

d 

7d 

CO 

CM 

CD 

"cm 

CO 

CD 

o> 

CD 

in 

o 

CM 

in 

to 

CD 

in 

oo 

in 

in 

in 

Tn 

iS 

CD 

Tn 

CDi 

S: 

CO 

"in 

8 

m 

CO 

s 

o 

tn 

CD 

To" 

in 

oo 

o 

CO 

To 

o 

in 

cvi 

To 

CO 

1^ 

CM 

Tn" 

CD 

N. 

CD 

cvi 

S 

CM 

CM 

CM 

d 

•M- 

CD 

CM 

CO 

O 

in 

8141361:  3.6059 

32632161  1.4453 

2263067;  1.0023 

To 

CO 

CO* 

CO 

s 

CO 

h' 

¥ 

CO 

CM 

, 

'(t 

CM 

CM 

CD 

h> 

CM 

CO 

CO 

1^ 

d 

00 

CD 

in 

CD 

"o' 

00 

h- 

CO 

CO 

z 

in 

tn 

h- 

CO 

2792620  1.2369 

To 

CM 

CM 

CD 

d 

CO 

CD 

in 

o 

TT 

1  1 

"cD 

OT 

CD 

q 

t". 

00 

CM 

O 

OT 

in 

CO 

To 

CO 

CD 

•M- 

d 

OT 

Oi 

OTl 

in; 

o| 

J 

Tm 

CD 

d 

o 

OT 

o 

CD 

CO 

CM 

I 

•M- 

O 

SI 
00  1 

Si 

1 

Tn 

CD 

CD 

■M- 

d 

h«- 

o 

in 

in 

To 

OT 

d 

CD 

CD 

CD 

CM 

CD 

CM 

CO 

CM 

o> 

h- 

<o 

1^ 

r^ 

0) 

<0 

1 

o> 

h- 

r- 

to 

CM 

CM 

o 

CM 

CD 

Cf) 

ID 

CD 

CD 

S 

in 

CO 

CM 

o> 

00 

CD 

s 

CD 

o 

OC 

o 

i 

N. 

CD 

m 

CD 

CD 

O 

CD 

S 

8 

h- 

in 

CM 

00 

s 

CD 

CD 

CM 

CD 

CO 

CM 

h- 

Tj- 

CD 

O 

CD 

CD 

in 

CM 

CM 

o 

CO 

o 

CO 

CO 

s 

s: 

CD 

CO 

CO 

CM 

CO 

CM 

o 

CO 

o 

r- 

w 

CD 

CO 

CM 

CM 

h- 

CM 

CO 

CO 

CD 

s 

s 

CO 

in 

00 

$ 

s 

w 

CM 

in 

CO 

o 

in 

CM 

in 

CO 

5 

oo 

r- 

in 

5 

in 

CD 

tT 

CD 

00 

CD 

■M- 

w 

CO 

o 

CD 

CD 

CD 

6) 

CO 

s 

<s 

in 

to 

CM 

CM 

s 

(O 

CM 

o 

h* 

s 

CD 

Reference  Statistics: 

Total  Instruction  References 

Total  Data  References 

§ 

0) 

CM 

O 

0) 

d 

CD 

CD 

O 

CM 

CM 

d 

CD 

CO 

CO 

o 

d 

h- 

CM 

O 

in 

CO 

CD 

CD 

d 

r^ 

CO 

CM 

00 

d 

*0 

CD 

CO 

q 

CD 

CD 

d 

T7 

r<- 

r«- 

in 

d 

s 

CD 

00 

d 

CO 

CM 

m 

d 

in 

CO 

d 

h- 

00 

CD 

d 

"oo” 

in 

CM 

•M- 

d 

07 

0- 

CD 

d 

CO 

N 

d 

"oT 

r«». 

o 

CO 

d 

~a 

d 

Tn 

in 

o 

CD 

d 

R 

in 

CM 

d 

*00 

o 

CO 

d 

in 

CD 

CM 

CO 

d 

CD 

OO 

o 

d 

o 

d 

"tv 

CO 

CM 

d 

S 

in 

o 

d 

To 

CO 

o 

d 

To 

00 

d 

CD 

CO 

o 

d 

To 

;r>- 

o 

d 

__ 

"ov 

h*., 

o 

d 

Tfy 

o 

o' 

d 

CM 

o 

o 

d 

CO 

o 

d 

OT 

O 

O 

d 

To 

O 

o 

d 

o 

CC 

rt 

"cS 

CO 

% 

Total  References 

iMIss  Statistics: 

To 

j£ 

O) 

s 

CO 

CM 

CO 

00 

m 

CD 

oo 

is 

00 

o 

CM 

CM 

CM 

o 

CO 

CD 

CO 

o 

m 

CO 

CD 

CD 

■M- 

S 

m 

CM 

in 

ft! 

O) 

CM 

CM 

CD 

CO 

in 

o 

00 

"to 

w 

SI 

o 

In 

00 

o 

CO 

CM 

in 

CD 

"ar 

o 

s 

s 

in 

1^ 

CD 

§ 

s 

1^ 

CD 

in 

CM 

o 

S 

h- 

s 

CD 

CM 

CD 

•M- 

7m 

CD 

CM 

00 

in 

CD 

CD 

In 

00 

in 

CO 

<0 

o 

CM 

CD 

o' 

CO 

Tn 

CO 

CM 

O 

§ 

~S 

to 

o 

CD 

h- 

CO 

Tn 

? 

w 

CD 

in 

T^ 

1^ 

CM 

w 

tn 

CM 

5 

o 

CD 

N. 

CM 

CD 

s 

CO 

CD 

CO 

CD 

CO 

h«- 

CO 

in 

CD 

CO 

CM 

Tn 

CM 

CD 

CO 

O 

CM 

Tm 

CM 

00 

in 

~o> 

CD 

-M- 

s 

CO 

s 

CD 

CM 

O 

OT 

CO 

■M- 

OT 

CD 

oi 

co; 

5 

KT 

CO 

in 

CO 

h- 

To 

in 

00 

_J 

lo 

CO 

oo 

Si 

CO 

Tm 

h- 

OT 

OT 

in 

Tn 

CD 

s 

1 

CM 

OT 

OT 

CM 

CM 

Ps, 

z 

OT 

OT 

CD 

d 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

To 

Tn" 

"o 

"nT 

"oo 

o" 

CM 

w 

CM 

CM 

CO 

CM 

Tn 

CM 

Td 

CM 

TC 

CM 

CM 

OT 

CM 

“d 

CO 

OT 

OT 

OT 

Z' 

Tn 

OT 

Tol 

OT 

P- 

OT 

CD 

OT 

_ 

CD 

OT 

149 


15465  0.0016  262666  0.1163  115398  0.1926  378064  0.1324  393529  0.0311  393017 

122332  0.0125  5096588  2.2573  644486  1.0765  5741074  2.0099  5863406  0.4641  5863150 

51647  0.0053  1849075  0.6190  250909  0.4191  2099964  0.7352  2151631  0.1703  2151375 

21193  0.0022  348444  0.1543  94040  0.1571  442484  0.1549  463677  0.0367  463421 


Table  9:  Alvinn  Alone 


150 


77084291  O.IO80I  77081781  251 


Table  10:  Compress  w/  Operating  System,  Compress  Data 


5 


1356  0.0016  3317306!  14.8015  21238  0.2492  3338544  107926  3339900  2.8309  227402  3112494 

2117  0.0024  3684337'  16.4391  83660  0.9817  3767997  12.1809  3770114  3.1956  232668  3537442 

1472  0.0017  3467971;  15.4737  28443  0.3338  3496414  11.3029  3497686  2.9648  256781  3241101 

1234  i  0.00141  3421706!  15.26731  '  202081  0.237l|  34419141  Ti7l 2681  3443148  2.9164  268756  3174388' 


Table  11:  Compress  w/  Operating  System,  Operating  System  Data 


152 


Table  12:  Compress  w/  Operating  System,  Combined  Data 


1 

1 

1 

1 

1 

1 

j 

1 

1 

I 

I 

1 

1 

I 

1 

1 

I 

1 

1 

1 

CM 

ID 

00 

ID* 

4.5650 

O 

CO 

CM 

CO 

CM 

CD 

CO 

CM 

h- 

CD 

CO* 

<35 

ID 

00 

'It 

CM 

CO 

CO 

CM 

z 

CO 

6.2765 

ID 

CO 

q 

CD 

Z 

CO 

CO* 

3.4155 

3.1954 

CO 

CD 

TO 

o 

CO 

TO 

ID 

CO* 

TO 

3^ 

CD 

CM 

CO 

TO 

CO 

CO 

CO 

CM 

CO 

TO 

<0 

CO 

CM 

CO* 

TT 

TO 

rv 

CM 

CO 

CO 

CO 

CM 

O 

CO 

CM 

CD 

CO 

CO 

cvi 

CO 

z 

h- 

cvi 

CM 

Z 

CO 

CM 

z 

cvi 

0.0414  3543943:  14.8090  63258  0.6784  3607201  10.8471  3645586  2.8963 

0.0470  39921241  16.6819  125259  1.3434  4117383  12.3813  4160949  3.3058 

0.0389  3753016i  15.6827  64938  0.6965  3817954  11.4809  3853941  3.0619 

0.0338  3702521  i  15.4717  54683  0.5865  3757204  11.2982  3788488  3.0099 

o 

H 

s 

CO 

s 

CD 

5 

? 

ID 

ID 

CO 

CO 

h- 

00 

CO 

CO 

o> 

05 

CD 

ID 

CO 

00 

CD 

o 

o 

CO 

00 

CM 

00 

CO 

CD 

ID 

CO 

ID 

CO 

CM 

CO 

CO 

§ 

CM 

LD 

CO 

h- 

CD 

00 

h- 

CD 

CD 

CO 

§ 

CD 

ID 

h- 

O 

C35 

00 

CO 

o 

O 

O 

s 

CO 

h- 

ID 

CO 

CO 

ID 

g 

ID 

J 

I 

o 

CD 

CO 

ID 

■M- 

5 

o 

o 

<35 

!>. 

<35 

O 

CD 

ID 

CO 

z 

z 

h- 

<M 

CM 

CO 

•M- 

h- 

CO 

TO 

CO 

TO 

CM 

ID 

ID 

O 

■M" 

CM 

CM 

TO 

00 

CO 

o 

o 

z 

CO 

'M- 

GO 

o 

5 

h- 

z 

z 

CO 

h- 

h- 

00 

z 

CD 

■M- 

CO 

CO 

TO 

CO 

S 

rr 

r>- 

f^ 

CM 

'M' 

CM 

TO 

§ 

CO 

ID 

CO 

00 

TO 

CD 

ID 

CO 

o 

CO 

ID 

S 

CO 

CO 

CD 

TO 

<0 

a 

CO 

z 

TO 

CO 

O 

r«- 

CO 

_J 

CD 

iri 

CM 

ID 

CO 

O) 

CO 

o 

s 

d 

CO 

ID 

r- 

<o 

05 

s 

5 

5; 

Tj- 

r>; 

00 

rv 

00 

CO 

O 

CD 

CD 

CJ5 

N. 

ID 

CO 

h-' 

CD 

h- 

CO 

q 

h- 

CD 

CD 

00 

CD 

CM 

g 

ID 

CM 

CM 

16.5902 

C35 

CO 

h*. 

CD 

ID 

h- 

<35 

cvi 

CD 

<35 

ID 

ID 

CM 

<35 

<35 

CM 

CD 

r^' 

13.4956 

r- 

ID 

C35 

<35 

cvi 

OO 

TT 

CO* 

CM 

TO 

ID 

CO 

ID 

h- 

O 

<35 

CO 

CD 

CD 

TT 

cvi 

CM 

h- 

TO 

q 

CM 

O 

CD 

q 

TO 

00 

CM 

CO 

cvi 

z 

hv 

o 

cvi 

<o 

o 

q 

CM 

TO 

CO 

'Cf 

z 

cm’ 

O 

r>- 

o 

CM 

cvi 

N 

CO 

q 

O 

CD 

ID 

d 

-O’ 

TO 

CO 

d 

CO 

CO 

o 

q 

ID 

CO 

O 

q 

ec 

rt 

O 

CO 

CO 

CO 

ca 

ID 

S 

CD 

CO 

ID 

00 

5 

CO 

CO 

CM 

O 

ID 

h- 

CM 

CO 

CD 

CO 

o 

ID 

CO 

00 

ID 

CM 

CO 

CO 

CM 

o 

05 

ID 

o 

CO 

o 

TT 

O 

00 

ID 

O 

S 

CD 

<35 

CD 

Id 

CD 

C35 

CD 

1 

TT 

h-. 

CO 

<35 

CO 

ID 

N 

<35 

00 

CD 

g 

CM 

s 

ID 

co 

CO 

o 

CM 

ID 

s 

z 

o 

00 

CD 

CO 

h* 

S 

r«- 

<M 

CO 

00 

ID 

o 

CO 

<35 

g 

? 

CO 

o 

h- 

s 

CO 

TT 

CD 

h- 

h- 

<D 

CO 

CD 

CD 

O 

s 

CM 

ID 

ID 

TO 

O 

TO 

TT 

CO 

ID 

'M- 

CM 

CO 

CO 

TO 

CO 

CO 

CO 

CO 

o 

CO 

fs. 

rv 

CO 

ID 

rr 

CM 

CD 

CD 

CM 

ID 

CD 

ID 

CD 

O 

TT 

CO 

TO 

TO 

ID 

CO 

CO 

h- 

h* 

CM 

CM 

h- 

ID 

CM 

TO 

CO 

O 

CM 

rr 

TO 

CM 

z 

ID 

O 

CD 

CD 

OO 

CD 

CO 

fe 

o 

ID 

CO 

CO 

z 

ID 

ID 

z 

r^ 

CD 

ID 

CM 

CO 

CM 

TO 

ID 

CD 

CO 

lO 

CM 

s 

ID 

CD 

s 

q 

CO 

o 

d 

s 

CD 

o> 

CO 

s 

CM 

CO 

CM 

ID 

ID 

CO 

cvi 

S 

t^ 

q 

CD 

g 

CM 

CO 

oo 

CO 

ID 

CM 

CM 

CO 

O 

CO 

00 

CO 

z 

ID 

TT 

CO 

00 

o 

o 

CO 

g 

CO 

CM 

CM 

<35 

O 

CM* 

r^ 

o 

o 

ID 

CM 

ID 

O 

cvi 

CO 

z 

TT 

05 

ID 

CO 

'cr 

ID 

O 

z 

CD 

CM* 

CO 

g 

ID 

CO 

CD 

CM 

cvi 

§ 

TO 

CM 

CO 

TO 

CD 

CO 

CD 

r-; 

CM 

TO 

Cv 

o 

CM 

h«. 

O 

cvi 

CM 

TO 

CO 

q 

g 

ID 

TO 

d 

CO 

tt 

h- 

q 

U 

TO 

ID 

TO 

TO 

d 

CO 

ID 

CO 

TO 

d 

CO 

z 

q 

CD 

CD 

h- 

d 

I 

ID 

CO 

LD 

CM 

CM 

ID 

2 

o 

CD 

O 

O 

CO 

00 

h. 

ID 

O 

O 

05 

CJ5 

CD 

CO 

CO 

CO 

(D 

CM 

CM 

CO 

CO 

05 

CM 

CM 

i 

CD 

CD 

ID 

00 

CO 

<35 

05 

CM 

CM 

CD 

O 

CM 

CD 

CO 

CD 

ID 

CD 

rr 

CM 

CM 

00 

O 

■M- 

CD 

<35 

<35 

CO 

h- 

CO 

CD 

CO 

CM 

O 

z 

o 

CM 

<35 

00 

00 

z 

o 

ID 

CO 

CD 

CD 

CO 

CO 

CM 

CD 

<35 

CM 

O 

CO 

ID 

CO 

CO 

<35 

'<r 

5) 

r«- 

ID 

CM 

O 

ID 

CM 

CM 

CM 

ID 

CM 

CD 

h- 

ID 

CM 

CM 

5 

CD 

CD 

CD 

r^ 

CD 

<0 

TO 

z 

CD 

TO 

h. 

CO 

o 

CM 

i 

<0 

CO 

o 

00 

CO 

rs. 

CO 

CO 

CM 

ID 

CM 

<0 

CM 

<o 

00 

CO 

z 

CM 

CD 

ID 

CO 

CM 

TO 

z 

CM 

N. 

00 

CO 

CM 

CM 

- 

U) 

o 

s 

CM 

O) 

CD 

ID 

s 

d 

CM 

CD 

a> 

00 

CM 

CO 

? 

O 

CD 

O 

CM 

CO 

CO 

CO 

CM 

CM 

00 

CD 

00 

W 

05 

C55 

ID 

CO 

CO 

CO 

CO 

CD 

05 

5 

O 

00 

CM 

CM 

ID 

O 

CD 

tT 

C35 

CM 

C35 

ID 

5 

CM 

CO 

CO 

00 

CM 

ID 

CO 

o 

CM 

ID 

CO 

OO 

S 

CM 

CM 

o 

h. 

00 

CO 

CD 

CD 

o 

o 

ID 

ID 

CM 

CM 

ID 

CO 

CO 

<35 

<35 

ts! 

CO 

CO 

CO 

<35 

05 

CM 

CO 

TO 

TO 

d 

CM 

ID 

O 

O 

05 

CM 

O 

CD 

CO 

o 

ID 

ID 

00 

o 

ID* 

CM 

CO 

CO 

N. 

r«. 

o 

CM 

CD 

CO 

CO 

TO 

CD 

ID 

CM 

o 

CM 

CO 

00 

O 

CD 

O 

CM 

T“ 

CM 

O 

TO 

ID 

CD 

CO 

TO 

q 

ID 

ID 

CO 

<0 

q 

r- 

ID 

O 

tT 

1 

CM 

CM 

CO 

q 

ID 

Z 

CO 

P 

'M- 

Oi 

§ 

CJ) 

CO 

CM 

CO 

o 

a> 

CO 

§ 

CO 

CO 

S 

CM 

CO 

CO 

CO 

o 

s 

CD 

CO 

U3 

CM 

CO 

CC 

CO 

CM 

CD 

CO 

CO 

O) 

05 

CD 

h- 

o> 

? 

CO 

Si 

s 

ID 

CO 

0) 

o 

o 

CO 

CO 

CD 

CO 

ID 

CD 

s 

5 

S 

CO 

CD 

<35 

CO 

<35 

CO 

05 

ID 

ID 

<35 

O 

h- 

s 

z 

O 

N. 

ID 

CD 

ID 

CO 

S 

CM 

OO 

ID 

s 

CD 

CD 

O 

CM 

O 

f'- 

CM 

<35 

<35 

00 

h- 

5 

<35 

O 

z 

z 

<35 

CO 

ID 

TO 

z 

CO 

o 

ID 

*TO 

1 

CO 

<D 

o 

00 

CD 

00 

ID 

CO 

Reference  Statistics: 

Total  Instruction  References 

Total  Data  References 

O' 

CM 

fe 

P 

T- 

CO 

CD 

o> 

CM 

d 

CO 

o 

ID 

d 

CO 

CO 

o 

CM 

h- 

CJ5 

00 

d 

d 

CO 

rr 

CM 

05 

d 

CD 

5 

CM 

d 

<35 

CO 

ID 

q 

s 

CO 

d 

g 

d 

O 

CM 

00 

CO 

d 

<o 

CM 

d 

W 

O 

CM 

d 

S 

{3 

d 

S 

d 

<35 

d 

z 

CO 

CM 

d 

I 

d 

TO 

d 

z 

CD 

d 

'tj- 

d 

TO 

CO 

CM 

d 

ID 

c«. 

d 

d 

S 

TO 

O 

d 

TO 

00 

O 

d 

TT 

TO 

N. 

O 

d 

0^ 

CO 

o 

d 

o 

00 

r««. 

o 

d 

CO 

CM 

CD 

o 

d 

ID 

O 

ID 

q 

d 

CO 

ID 

O 

d 

CO 

TT 

O 

d 

Data  Reads 

Data  writes 

Total  References 

Miss  Statistics: 

w 

c 

"co 

CM 

O 

CO 

TT 

CM 

o 

CO 

C75 

~S 

CD 

g 

CM 

05 

CO 

o' 

O 

O 

CO 

ID 

CJ5 

o' 

o 

ID 

CO 

!? 

hs. 

S 

CO 

1m 

h*. 

CM 

h* 

CM 

CO 

lo 

05 

CO 

CO 

CM 

CO 

oT 

N. 

h- 

ID 

CO 

CM 

CM 

CD 

00 

CO 

ID 

CM 

ur 

CO 

00 

ID 

<35 

CM 

CO 

<35 

h** 

00 

CM 

CM 

CD 

CD 

CO 

ID 

CO 

<35 

ID 

h- 

ID 

CM 

s 

■cr 

h- 

00 

ID 

rf 

CM 

CO 

ID 

CM 

o 

oo 

00 

<35 

C35 

00 

ID 

<35 

CD 

CO 

CO 

(35 

CO 

CO 

CM 

o 

ID 

Z 

CD 

CD 

CD 

CO 

O 

CO 

hs. 

CO 

ID 

CO 

CM 

CO 

TO 

O 

s 

ID 

ID 

CO 

GO 

o 

lo 

CO 

CO 

CO 

TO 

CM 

CD 

CO 

00 

o 

o 

CD 

CM 

CO 

<D 

I^ 

ID 

CO 

ID 

z 

00 

CD 

CO 

CM 

r- 

ID 

CM 

OO 

ID 

h* 

CO 

ID 

TO 

TO 

TO 

r«. 

CO 

ID 

1m 

CM 

5 

i 

i 

ID 

CO 

CO 

<D 

CO 

CD 

CD 

ID 

CO 

CO 

TO 

ID 

CO 

z 

CM 

CO 

0) 

x: 

o 

ca 

O 

~a 

"cm 

"co 

Tf 

iTd 

_ 

1 

1 

1 

1 

"o 

CM 

lo 

Id 

Id 

CO 

<35 

~a 

CM 

CM 

CM 

CM 

CO 

CM 

ID 

CM 

_ 

CM 

h- 

CM 

CO 

CM 

TO 

CM 

o' 

o 

o 

CM 

CO 

IV 

CO 

5 

Id 

CO 

<0 

CO 

h- 

co 

CO 

CO 

TO 

CO 

53 


36385  0.0414  35439431  14.8090  63258  0.6764  3607201  10.6471  3645586  2.8963 

43566  0.0470  3992124116.6819  125259  1.3434  4117383  12.3613  '  4160949  3.3058 

35987  0.0389  3753016!  15.6827 _  64938  0.6965  3617954  11,4809  3853941  3.0619 

312841  0.03381  3702521  MS. 4717 1  546831  0.5865  3757204  11.2982  3788488  3.0099 


Table  13:  GCC  w/  Operating  System,  GCC  Data 


154 


Table  14:  GCC  w/  Operating  System,  Operating  System  Data 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

CM 

to 

CM 

CM 

to 

CM 

CM 

to 

CM 

LO 

CM 

to 

CM 

to 

CM 

s 

T— 

CD 

CO 

o 

If) 

CO 

o 

If) 

CO 

o 

If) 

CO 

If) 

CM 

CO 

If) 

CM 

CO 

If) 

CM 

If) 

CM 

tf) 

CM 

to 

CM 

CO 

o 

8 

o 

8 

o 

03 

o 

tf) 

03 

o 

tf) 

03 

o 

If) 

CO 

If) 

CM 

CO 

to 

CM 

CO 

to 

CM 

CM 

d 

CO 

d 

d 

o 

CD 

o 

to 

CD 

o 

to 

CD 

o 

to 

CO 

to 

CM 

CO 

to 

CM 

CO 

to 

CM 

.£ 

1289459 

CM 

03 

03 

O 

CD 

CO 

CM 

03 

S 

CM 

If) 

CD 

CM 

CD 

CO 

CO 

CO 

CM 

h- 

CO 

lO 

o 

to 

to 

to 

03 

o 

CD 

to 

CO 

03 

CD 

Ym 

o 

to 

to 

a 

CM 

h- 

CD 

w 

CO 

03 

CO 

CD 

CM 

03 

CO 

CM 

CD 

o 

1119592 

CM 

03 

CO 

If) 

h- 

•M- 

s 

o 

h- 

R 

tf) 

CD 

03 

CO 

CD 

s 

tf) 

03 

CM 

o 

i 

03 

5 

w 

If) 

03 

CM 

tf) 

If) 

00 

CO 

CO 

5 

CM 

03 

o 

00 

00 

03 

8 

s 

03 

CO 

CO 

CM 

If) 

03 

CD 

CD 

h- 

03 

03 

CD 

s 

? 

h^ 

If) 

1— 

00 

CO 

tf) 

d 

h. 

CD 

CO 

03 

03 

2 

h- 

TT 

03 

CM 

CM 

CM 

CD 

03 

o 

d 

CD 

to 

03 

CD 

CD 

CO 

s 

03 

to 

to 

CO 

If) 

03 

to 

CD 

to 

CM 

h. 

CM 

o 

to 

o 

to 

o 

R 

s 

CO 

00 

o 

o 

d 

CM 

o 

hi 

to 

h- 

Tf 

CD 

to 

CM 

CD 

o 

Tf 

s 

CM 

hN 

CM 

CO 

o 

S 

(£> 

(0 

CO 

h- 

O 

CJ> 

CO 

CO 

CM 

s 

s 

CM 

O 

03 

to 

CO 

CD 

CD 

to 

2538367 

CM 

CD 

o 

o 

CD 

CM 

CM 

CM 

CO 

CO 

&; 

CM 

CM 

U) 

s 

8 

CM 

00 

o 

§ 

I 

CO 

h- 

CD 

CO 

TT 

h- 

to 

1 

CM 

CO 

CM 

CO 

03 

CD 

1530443 

h- 

If) 

r- 

If) 

O 

CM 

1760609 

CO 

O) 

s 

2 

CD 

03 

h- 

CO 

CM 

CO 

1343704 

1272674 

1637217 

CD 

00 

03 

CM 

04 

If) 

CO 

oo 

o 

If) 

CM 

o 

CO 

oo 

If) 

2 

CO 

CO 

CO 

CO 

CM 

CM 

o 

o 

CO 

CM 

CM 

CM 

CM 

CO 

CO 

CO 

o 

CO 

d 

CD 

o 

03 

5 

CO 

CM 

o 

CO 

h- 

to 

CM 

o 

C33 

h. 

U) 

CD 

00 

to 

h- 

CM 

d 

CO 

03 

CO 

03 

CD 

Si 

h. 

o 

h- 

co 

CO 

CD 

CD 

CO 

o 

00 

h. 

00 

h. 

CD 

CM 

CO 

CO 

h- 

to 

CD 

d 

U) 

CM 

LO 

o 

o 

CD 

h. 

to 

o 

h^ 

g 

CM 

o 

d 

CO 

TT 

IS 

N. 

05 

CM 

h«. 

CM 

CM 

03 

O 

CO 

cb 

CD 

ui 

cb 

CO 

o 

CM 

CM 

lb 

CM 

CO 

CM 

to 

h- 

cvi 

CO 

o 

h- 

s 

o 

q 

o 

CD 

03 

to 

CO 

CM 

h- 

d 

2 

o 

d 

CO 

CD 

O 

TT 

cvi 

03 

CM 

CO 

CD 

d 

03 

h- 

oo 

o 

to 

d 

00 

o 

00 

od 

CD 

CM 

cd 

CO 

o 

CD 

CO 

03 

03 

1 

od 

h- 

CM 

CO 

oo 

hi 

s 

03 

CO 

C33 

h- 

h- 

hi 

CO 

If) 

hs. 

If) 

cb 

CO 

03 

o 

If) 

hi 

CO 

tf) 

CO 

CM 

cb 

03 

CM 

U) 

CO 

oo 

03 

If) 

cb 

d 

CD 

h- 

ib 

o 

CO 

CM 

tb 

d 

CO 

to 

lb 

h- 

CD 

CO 

q 

'<r 

o 

CM 

00 

cb 

00 

03 

CO 

q 

'tf 

hv 

w 

tf) 

cb 

If) 

CD 

CM 

d 

to 

h- 

q 

s 

03 

CD 

cb 

d 

to 

o 

cb 

o 

3167450 

S 

uo 

o 

i 

03 

Oi 

s 

CO 

CD 

o 

CM 

o 

o 

CO 

CJ3 

CM 

to 

CO 

to 

CM 

o 

03 

CO 

CD 

s 

CO 

3817809 

Oi 

CD 

§ 

fe 

CO 

3096683 

o 

1^ 

to 

o 

§ 

3067115 

CD 

5 

o 

03 

CD 

CM 

CD 

03 

o 

o 

lf3 

CD 

CM 

h- 

lO 

If) 

CO 

CM 

CO 

o 

CD 

If) 

CM 

8 

2614513 

2779517 

03 

tf) 

o 

eo 

CM 

CO 

CM 

CO 

o 

If) 

CM 

CM 

CM 

z 

00 

tf) 

h^ 

to 

CM 

h- 

co 

w 

h. 

h* 

i 

CM 

2485063 

1^ 

CM 

% 

8 

o 

tf) 

s 

CO 

h. 

CO 

d 

CD 

CO 

Oi 

CM 

d 

03 

2 

'd- 

o 

oo 

o 

o 

TT 

00 

d 

to 

h- 

o 

CM 

to 

CM 

to 

03 

o 

to 

to 

CO 

CO 

03 

CD 

CM 

CD 

00 

o 

h- 

d 

o 

CD 

00 

o 

CM 

CD 

03 

CO 

CO 

CO 

03 

CM 

CO 

CD 

o 

CD 

03 

CD 

CO 

CO 

o 

eo 

h. 

<s 

CD 

CO 

CM 

N. 

d 

CO 

h. 

CD 

03 

2 

8 

j 

o 

6 

CM 

r- 

to 

CM 

CM 

lb 

CO 

CO 

CO 

CM 

C53 

CM 

o 

h- 

co 

lb 

CO 

CD 

CO 

Oi 

cvi 

CM 

C33 

o 

o 

h- 

cb 

CM 

o 

to 

cb 

CM 

CO 

CD 

CM 

CO 

cb 

CM 

CO 

03 

to 

CO 

03 

? 

h- 

03 

CM 

CO 

CD 

00 

o 

TT 

CM 

o 

8 

s 

CO 

d 

03 

CO 

h- 

CO 

hi 

If) 

CVJ 

CO 

CO 

cvi 

tf) 

o 

CM 

cd 

If) 

03 

CD 

CV] 

h^ 

CM 

CO 

CO 

cb 

03 

CO 

CO 

O) 

03 

CM 

03 

CO 

lb 

00 

CM 

CO 

tb 

CO 

eo 

03 

CO 

cvi 

§ 

5 

03 

CO 

CM 

CD 

CM 

hi 

CM 

tf) 

o 

q 

00 

CM 

d 

03 

03 

03 

CM 

hi 

o 

CD 

CM 

CO 

cvi 

s 

h. 

h> 

d 

CM 

CD 

IO 

CO 

03 

h. 

h. 

q 

cb 

CO 

CO 

tb 

CD 

CO 

CD 

03 

cb 

to 

00 

03 

§ 

tb 

CO 

CM 

eo 

CO 

'<f 

CO 

1^ 

CD 

d 

CD 

d 

CD 

hi 

d 

CM 

cb 

<5 

re 

Q 

th 

lO 

o 

o 

to 

lb 

O) 

s 

CD 

CD 

u5 

CO 

Oi 

to 

s 

■M- 

03 

o 

CM 

CD 

r^ 

CM 

o 

o 

o 

CD 

CM 

CO 

CO 

i 

i 

i 

s 

CO 

to 

CD 

5 

s 

10 

CO 

s 

CM 

03 

d 

CD 

o 

to 

to 

o 

CD 

CM 

C33 

O 

to 

CO 

If) 

If) 

s 

CO 

CM 

CM 

o 

o 

U) 

CM 

CD 

CO 

CO 

03 

CD 

CO 

h- 

03 

TT 

03 

s 

o 

J 

o 

tf) 

CO 

o 

1505373 

CM 

CD 

h*. 

o 

CO 

CM 

CM 

o 

03 

hi 

03 

CO 

o 

CD 

03 

o 

CD 

CO 

CM 

h* 

CM 

Si 

CD 

tf) 

o 

d 

d 

Th 

SI 

CO 

03 

CD 

s 

tf) 

to 

CO 

U) 

s 

03 

CO 

CO 

to 

d; 

00; 

1 

CO 

to 

to 

CM 

h*. 

to 

CD 

CO 

h-l 

CD! 

CD 

h- 

h- 

CO 

h- 

CD 

CO 

d 

CD 

03 

o 

CO 

o 

h. 

s 

o 

h- 

00 

CD 

d 

■M- 

CM 

h. 

CO 

CD 

CO 

CO 

CD 

o 

it 

CO 

h. 

h. 

Tf 

CD 

If) 

CD 

CD 

to 

Tf 

h- 

TT 

b) 

CM 

■M; 

Y- 

lO 

r«. 

cb 

g 

If) 

lb 

CM 

s 

cvi 

CO 

5 

03 

tb 

5 

O) 

to 

o 

CD 

q 

CM 

s 

h*; 

s 

s 

cb 

10 

h- 

CO 

d 

00 

o 

CD 

TT 

to 

CO 

o 

03 

CO 

o 

h. 

U) 

cd 

03 

o 

o 

CD 

cvi 

d 

00 

If) 

d 

o 

d 

CO 

CO 

to 

O) 

CD 

s 

h- 

hl 

o 

s 

'If 

hi 

03 

CD 

h- 

cd 

CM 

CO 

CD 

If) 

cb 

To 

CO 

to 

cb 

d 

C33 

To 

tf) 

CO 

CO 

hi 

"d 

CD 

CO 

h. 

cb 

"d 

CD 

o 

CD 

cb 

K 

d 

to 

lb 

To 

CM 

CO 

h. 

Tb 

o 

CM 

to 

lb 

Tm 

CM 

Th 

hT 

h- 

co 

CO 

cb 

d 

CO 

q 

lb 

d 

If) 

If) 

cb 

"d 

CM 

CD 

o 

cb 

Tm 

CO 

■M" 

d 

U) 

CM 

CD 

cvi 

d 

o 

CM 

cvi 

CO 

co| 

j 

Tm 

IN. 

CD 

CO 

cvi 

Tvi 

CD 

q 

O 

00 

(O 

(0 

CO 

O) 

CM 

03 

CO 

CM 

CM 

5 

CM 

1 

s 

s 

CO 

CD 

h- 

oT 

h- 

TT 

03 

CO 

CM 

lO 

CO 

CO 

CO 

00 

03 

o 

CO 

i 

K 

CM 

03" 

§ 

CM 

CO 

CM 

to 

03 

03 

CM 

CO 

CO 

d 

CO 

CM 

CD 

CO 

C33 

CO 

CM 

CM 

Tb 

CM 

CO 

03 

CM 

CO 

bl 

h- 

CD 

h- 

CM 

CO 

CO 

03 

s 

CM 

CM 

CM 

CM 

CO 

CM 

Oi 

CM 

03 

o 

CM 

CO 

o 

If) 

CO 

03 

CD 

d 

h- 

CM 

CM 

o' 

CO 

10 

cT 

Tb 

o 

CM 

h- 

o 

U) 

CM 

CO 

CO 

CM 

"cb 

CD 

If) 

03 

03 

3 

CO 

CD 

h- 

“d 

s 

CM 

Tb 

If) 

CO 

to 

'cr 

lb 

CO 

CD 

CO 

CM 

"d 

h- 

CM 

rh 

TT 

T- 

"d 

d 

to 

"d 

03 

CM 

o 

o 

"d 

03 

CO 

CD 

CO 

1 

1 _ ! 

"d 

s 

CM 

CD 

"d” 

s 

§ 

"d 

o 

to 

CO 

Tb 

CM 

CD 

CO 

CD 

J 

Tb 

TT 

h. 

h. 

U) 

Oj 

0)! 

CM 

o 

Tb 

CD 

oo 

d 

d 

CO 

o 

o 

tf) 

O 

r^ 

co 

lb 

CM 

K" 

CO 

CM 

U3 

CO 

I 

q 

s 

CO 

03 

cb 

CM 

"cD 

to 

s 

cb 

CM 

CO 

03 

CO 

TT 

cb 

CM 

"oo 

LO 

o 

CM 

03 

CM 

CO 

to 

o 

CD 

"o 

CO 

CO 

s 

C33 

q 

o 

hT 

CM 

CD 

lb 

CM 

1^ 

tb 

o 

lb 

CM 

CD 

If) 

d 

CM 

o 

03 

cb 

d 

5 

cb 

o 

'd* 

03 

If) 

cb 

CM 

CO 

CO 

h- 

CO 

CM 

tf) 

cb 

g 

o 

q 

s 

fe 

CO 

-M- 

03* 

CO 

o 

h- 

o 

cT 

CM 

d 

385567!  7.5150 

"hr 

CO 

If) 

■M*’ 

d 

If) 

s 

575839  11.2236 

441008  8.5956 

Tb 

CM 

03 

h«. 

lb 

h. 

If) 

CM 

o 

d 

Tb 

CO 

q 

■M- 

s 

03 

00 

d 

CD 

CM 

03 

to 

CM 

CM 

CD 

530467  10,3393 

Tb 

U) 

CO 

03 

tb 

d 

If) 

d 

CO 

& 

h- 

•M; 

C4 

CO 

CD 

CD 

CM 

CM 

CD 

CD 

q 

d 

CD 

If) 

353350  6.8871 

281629  5.4892 

684496  13.3414 

d 

Tf 

h- 

CO 

d 

d 

CM 

CM 

CO 

to 

424489  8.2737 

18705569 

O 

(D 

o 

o 

to! 

j 

(O 

o 

o 

CM 

7744107 

h- 

(D 

O) 

CM 

*§ 

& 

h- 

CO 

o 

lO 

o 

CO 

h- 

CO 

o 

If) 

0) 

CO 

uo 

CD 

to 

CM 

s 

fe 

CD 

CO 

lO 

CO 

LO 

CO 

"cb 

CM 

CD 

w 

03 

CD 

o 

CM 

o 

CO 

03 

TT 

CO 

CM 

TT 

CM 

CD 

CM 

1C 

CD! 

S| 

col 

cm: 

o 

03 

to 

CM 

CD 

cb 

s 

■M- 

CO 

CM 

If) 

CO 

2! 

i 

CM 

CO 

CM 

If) 

If) 

o 

o 

o 

h^ 

CM 

CO 

cb 

03 

CD 

CO 

03 

CD 

If) 

o 

w 

o 

CM 

w 

o 

03 

00 

C33 

03 

oo 

CO 

CO 

N. 

If) 

tf) 

hv 

2 

Tm 

CO 

CM 

tf) 

o 

CO 

CO 

03 

03 

03 

03 

CO 

00 

CM 

To 

CO 

o 

U) 

Reference  Statistics; 

Total  Instruction  References 

Total  Data  References 

to 

0) 

to 

CO 

od 

CO 

cb 

"cm 

CM 

CM 

o> 

■M- 

cvi 

lo 

CO 

CM 

o 

cvi 

hi 

o 

CO 

CO 

CO 

q 

"co 

lO 

N 

CO 

od 

03 

CM 

CO 

o 

od 

CM 

U); 

CO! 

cd 

o 

cb 

CO 

03 

o 

cb 

"d 

CO 

03 

O 

cb 

d 

Tm 

U) 

o 

CM 

03 

CD 

If) 

CM 

q 

cd 

o 

03 

o 

hi 

CO 

CO 

W) 

cb 

TT 

If) 

h- 

CO 

cb 

s 

CO 

lb 

hT 

o 

00 

q 

1 

q 

CM 

CM 

If) 

cd 

"d 

CD 

CO 

q 

hi 

CO 

o 

03 

CM 

cb 

Tb 

d 

h; 

lb 

CM 

d 

q 

M- 

03 

03 

CD 

*d 

h* 

CM 

CM 

'Ct 

"bT 

d 

CD 

cb 

o 

CO 

cb 

CO 

d 

CM 

rt 

s 

If) 

CD 

cb 

§ 

CD 

CM 

cb 

TT 

CD 

CM 

cb 

CM 

R 

to 

CO 

cvi 

CO 

g 

'tr 

cvi 

d 

to 

o 

cvi 

If) 

o 

CO 

h-; 

Data  Reads 

Data  writes 

Total  References 

Miss  Statistics: 

*00 

c 

to 

03 

(0 

CO 

(O 

U3 

CD 

■M- 

CD 

CM 

~n 

CO 

h- 

§ 

CM 

s 

'n 

s 

03 

s 

CM 

2208587 

CO 

CO 

CM 

CM 

CM 

"io 

to 

CD 

CD 

CD 

to 

CD 

h«- 

CO 

to 

to 

t\ 

CD 

to 

o 

CO 

CD 

Tb 

CD 

CO 

03 

CO 

Tb 

5 

o 

'M* 

oT 

03 

03 

CO 

CO 

~d> 

h- 

CO; 

CM! 

-! 

CO 

CO 

CO 

s 

d 

h«. 

03 

CM 

CO 

OI 

o! 

CO; 

CM 

CM 

TT 

o 

CD 

CM 

03 

CM 

lO 

o 

o 

Tb 

CD 

Oi 

CM 

s 

Tb 

5 

Oi 

Oi 

CO 

03 

8 

If) 

"d 

CM 

o 

CO 

03 

CM 

h- 

CD 

h. 

"cb 

o 

d 

o 

CO 

o 

CO 

CM 

03 

d 

CO 

CO 

Tb 

CO 

CD 

o 

03 

h* 

"hi 

00 

to 

o 

CD 

CD 

CD 

o 

CO 

CD 

00 

CM 

CO 

to 

03 

h- 

d 

CO 

5 

CD' 

lb 

o 

CO 

d 

d 

CD 

CO 

d 

to 

o 

CM 

o 

to 

TT 

CD 

CD 

o 

M- 

CO 

If) 

CO 

CD 

-M- 

"o 

o 

d 

CO 

o 

LO 

o 

CO 

CO 

CO 

re 

sz 

o 

re 

O 

'o 

"cm 

CO 

■M- 

to 

"cD 

hT 

"oo 

oT 

"d 

To 

Tb 

Tb 

hT 

d 

03 

8 

Tm 

CM 

Tb 

CM 

8 

lb 

CM 

CD 

CM 

h. 

CM 

CM 

"di 

CM! 

o' 

CO 

d 

Tm 

CO 

CO 

CO 

dj 

1 

If) 

CO 

CO 

CO 

h. 

CO 

d 

CO 

CD 

CO 

155 


Table  15:  GCC  w/  Operating  System,  Combined  Data 


156 


Table  16:  Espresso  w/  Operating  System,  Espresso  Data 


OJ^.CMO)h.CD<OOrN.  o  ^ 

•5-5t-ojo)iou50)0  3r  00  O) 

4coi^u)Oh-r^oo  CO  o  r- 


^^coi^u)Oh-r^oocOor«. 

C'®'»-OI>.00U5C0<J)0)t-^<D 

C.Oh'.i-O-r-I^T-CMOCViOOO 

-tsocoou)ocoN.r«s.cooo>0) 
£'r-cM<D^o<ocvjin^r^.h--»- 
C\l^  ^C\JC\JCOCVir-COOJ 


C01005OI0C\JO0)OC0Mt~O<0<0 
C0C0f^CMir)O3<DC\Jf-tOCVJ00CVli-<O 
Tfr^0)U5O-^C00)e0C0C0h-C0<£)O 
r^i^'g-iOTt<Dh-co'»~050)mcocMo 
0<Dd)<DCO^O)OCMirJh«-COh«.<OCO 
0)Oooowooa)0)^co<ocNj«>cviT- 
<OTrmTrcvjTr<MC4-^cM^<M  mcvi 


CJICOIOICO  COCMO)0>T-(D-r-<OCDC\JCOON.OOCO<£>ljf)OCO 
h-  <D  <D  O)  COi-r^CMinT-0)OOC£>OOCMOlOU)'VOflO<N 
ID  O)  <D  r-  «>^<00>00'<t<00e00)c0<0h-'a'‘«5‘®^^ 
cou)Oeor^<o^r^<ocoeor-oiC\J5-’^T-obir)r- 
c\Jcn5'«-<D'r-<of^r^CM<oo)OW>obiO'5cocNjeo 
cDcD«03C\JC0oocvjir)rs.T-r«^i0T-rs.iAcvjoo 

COCMCVJ'r-CVJCVJOJCMOJCVlT-'r-  y-t-  -r-ff-T- 


IjOOCO-^OOOy- 
O  00  CJ  0>  00  OJ 

-g-  00  CM  -XT  ®  cvj 

CO  IT)  r-  <5  <5  T-  CM 


CMlO-r-CMg-CMT-OT-g-CDI^ 

<00)000<DOOO'r-CJ>0'3-OJ 

rr-g-cDCOCDCRoocMg-cocoeo 

Sdooin'^-^ir)CJ)CMi^co 
OT-T-cMoocDY-inor^ 
•g-rN.cM'g-'^-coojuo-i-oiJOo 
1-  -gcOCMCOCMCM-gCMCM 


CM  O  05  CO 
CO  CM  O  CJ) 
CM  eo  ir>  CM 

CM  rr  TJ-  CM 
lO  O  O  CO 
00  O)  CO  CO 
CM  CD  CM 


hw  lo  CO  Y-  lo  T->  o 

00  rf  CD  ^  Y-  CM  CM 

CD'^OOy-CMy- 
CO  O  lO  lO  O  O)  CM 
ID-g-'M-Y-ID-g-r- 
ID  CM  h-  ID  CO  CD  g- 

U5  0)  Yif  Y-  CO  O  r*- 

Y-  CM  CM  Y- 


cDir)iDcooocMOY-ir)cocMr>-r>.  -glco 

COY-OJCMY-lDCMOr^CJJCDYfY^r^CD 
Y-cocoh'Cog-cDY-fs.ocor^r^cO'g 
cocog'Y-cococY.Y-iocMCMC^CMOJin 
iDCDCOcoir)Y-iDor^cococMir)Y-cD 
Tj-Y-OOCMCOr-COr«-COOinCDflOOCD 
rY.CMiOYj-Y-iDCOCMY3-COlDCM  CDCM 


CO  CM  CO  h- 
Y-  rf  U5  CO  ' 

Sr^  ID 
00  CO  00 


Y-CJiY-CMOlY-lDYfCnCD 

Ytcoh-coeoocMh«.cncM 

Og-CJi'^tDIDOCDI^h- 

COCMOg'OJ'M-T-TrCOCO 


CMCOCOCOCOCOCOOCOO 
IDCDIDC»COOOy-IDCO 
IDCOCOCOCMCy-y-IDCDO 
Y-CD'r-C»C0O>C0rs.O3O) 
COOOCDOYj-C3)I^OCMt^ 
t^Y-CMCMCMO)CDCDC0lD 
Y-Y-  YtCOCMCOCMCMCO 


y-COCMy-COy-y-COy-Oyt-t-Oy-O 

cDTior^^Tco  Id  "cD  "o  "co  ^  ”cM  "o  "co  "w  1d  "id  0^ "o  "o 

T-  ID  CO  O  CMCOg-c£>-gY-T-Y~CDY-COOY3'OCD'd-CMlDY-CO 
CO  Y-  YS-  U)  C»CDCDCDOCOg'ON-CDOI^Y-OOY-OCOY-CO 
Y-oococMiDCMobcMCD’g'OFY.-gr-rrcooococM 
COCDCOg-COTrOY-CMOOOOYriDCDh-CDCDf^CMCM 
cocoh-coTr-^Y-coTrY-oor^cocoiDoocDCMoocD 


CM  Yt  CO  O  Y-  hv  o  CO00r-COY3-h.g-CM 

rr  Y-  YY-  tf>  ID  CO  CO  •gojcoocMcocDO 

Y-  CO  ID  O)  r««  00  ID  OY-corY-rviDOYT 

COCOO>COCMr«-Y-OOOOOOCOCMCOCR 
CMYa^Y-^Y-^YtY-^T-'YtY-'Y-^CMOCDCMd 

CO  ID  tY-  rr  c»  YY>  'o  To  coTio  oc6iDrY.oojcoooiN.YfCM 

Y“CorY.rNcococMOoocDOY-Y-TrrNCMY-Y-<ocDco 

ooog“CocMcococMcoa)CMg-<0)orN.  oirN  rN  co  h-  o 
cncoiDtN.coiN.corN.cMY-cMooo3rN.co  <o  yt  o  cm  co  co 

iDrNiN.cMCOY-iDg-Y-iDiDcoog-o  id  cn  co  tN  <33  y^ 

Y-CMCM03OOC0<dYfY-C0OC0C0Y-  Yf  O  <33  ID  y-  y- 

03COoOcOCOCOCOC33YrCOCnYfCMOYf  CM  Yf  Y-  U3  CM 


C33CO|N.C0lDCOCO5riDY^|N.0OCOY-  ^ 
Y-lDC0Y-C0Y-Oe003<0OCMlD<3)rN. 
otY-CMiDYrc33Y}-rs.t£>coYrc\jocMY-' 

CMCOY-;qcMC>pY-;qqqqqqc:> 

codcoddddcio’coo'cjcjcjd 


CM  Y-  U3  Y-  <33  CO  <0  00YfC33U3lDC0U3O 
CMC0<33CMYfO03  00’^C0<33C0OC0C0 

?O3y-<O00C0y-CMyJ-t-y-y(Xy-<O00 
<33C0COO3CMYj'Yf<33C0e0CMCM'g<D 
rN.C0lDO<M<3303^|N.IDC33<MlDe0<D 
03<OCMU)Yj-cOCOrN.COCOCO<M  <My- 


52667  0.0054  672626  0.2979  224966  0.3758  697592  0.3142  950259  0.0752  260526  6697291 

263601  0.0270  6566647  2.9084  962236  1.6073  7526883  2.6357  7792464  0.6168  1011811  6780669! 

170088  0.0174  2805531  1.2426  524484  0.8761  3330015  1.1658  3500103  0.2770  767670  27324291 

590671  0.00601  941089  0.41681  2181571  0.36441  1159246  0.4058  1218313  0.0964  427854  790455 


Table  17:  Espresso  w/  Operating  System,  Operating  System  Data 


1 

1 

1 

I 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

123 

CVJ 

r*- 

o 

m 

80S 

O) 

C3J 

03 

•«}■ 

CVJ 

CO 

CVJ 

CO 

CVJ 

CO 

CVJ 

o 

CO 

o 

CO 

o 

CO 

SOS 

SOS 

SOS 

in 

CVJ 

in 

CVJ 

CVJ 

§ 

1015 

1016 

1015 

o 

in 

fe 

m 

o 

in 

CVJ 

in 

CM 

CM 

in 

CM 

CVJ 

in 

CM 

in 

o 

1017 

1017 

in 

oo 

o 

in 

90S 

CM 

to 

CM 

CM 

in 

CM 

252 

c 

o 

s 

N. 

O 

C5 

s 

CVI 

CO 

h- 

w 

CO 

CO 

i 

o 

in 

CO 

CO 

00 

o 

uo 

4085660 

CO 

o 

oo 

CO 

4230192 

3319676 

3842717 

3873525 

2594945 

CO 

s 

CO 

CO 

n 

CO 

CO 

s 

CVJ 

CO 

CO 

CO 

00 

in 

in 

03 

CO 

CVJ 

CO 

§ 

CVJ 

s 

CO 

GO 

S 

2267879 

2316000 

2063965 

CO 

CO 

s 

h. 

O 

CVI 

2276856 

2523032 

Si 

tn 

CO 

CO 

r- 

co 

CO 

CO 

& 

CO 

CO 

CO 

CVJ 

o 

s 

in 

in 

o 

CD 

CO 

$ 

CD 

oo 

CO 

5 

h- 

1541508 

1233362 

1025844 

1080896 

616163 

CO 

CM 

CM 

CO 

CM 

CM 

1041763 

652111 

280435 

1011767 

767638 

427821 

o 

c 

1677662 

O) 

CVJ 

o 

CO 

h* 

o> 

426422 

242957 

1688931 

1334850 

CO 

CO 

in 

R! 

Tl- 

CO 

C3> 

CVJ 

r- 

S 

CVI 

o 

in 

CO 

Tt 

CO 

in 

s 

t-. 

CO 

in 

in 

s 

s 

o 

S 

CVJ 

w 

o 

h- 

o 

in 

CO 

h*. 

d- 

o 

o 

s 

CD 

CVI 

CO 

& 

h* 

CVJ 

1^ 

in 

O) 

CVI 

CO 

CO 

s 

CD 

CO 

in 

o 

s 

CD 

CVJ 

d 

s 

2341635 

1265454 

CVJ 

CVI 

00 

o 

o 

o 

03 

in 

CO 

h*. 

s 

CO 

CO 

in 

CO 

1^ 

1^ 

s 

d 

688984 

501592 

388476 

767123 

815874 

521355 

551136 

254013 

CO 

s 

§ 

in 

00 

In 

i^ 

co 

03 

O 

d 

U) 

CM 

03 

o 

8 

i>- 

CD 

h- 

d 

S 

00 

in 

in 

CO 

CM 

g 

5 

CVJ 

lO 

CO 

7.5419 

3.5444 

CO 

03 

03 

1^ 

05 

03 

CO 

CJ3 

CO 

h- 

CO 

CO 

1“ 

CO 

O 

s 

CO 

8 

CD 

in 

CO 

s 

in 

r«. 

cvj 

h- 

5 

m 

CVJ 

CVJ 

CO 

CO 

CD 

CO 

CO 

cvi 

12.8302 

CD 

CO 

in 

CO 

CD 

CO 

CO 

CD 

d 

1^ 

1^ 

d 

o 

cd 

s 

CO 

6 

CVJ 

in 

o 

i 

cd 

m 

in 

i^ 

in 

cd 

m 

d 

cd 

CO 

CVI 

N. 

p 

cd 

tn 

in 

cd 

o 

d 

d 

CVJ 

CVJ 

CO 

CO 

cvi 

CO 

in 

CO 

p 

in 

in 

o 

CO 

q 

d 

CM 

CD 

CO 

cvi 

s 

CVJ 

in 

in 

j^ 

d 

o 

P 

d 

CM 

CO 

o 

h- 

cd 

CO 

o 

CD 

cd 

i_ 

CD 

d 

CO 

o 

cvi 

g 

q 

o 

d 

S 

(0 

cvi 

in 

o 

CO 

q 

d 

CO 

in 

CO 

cd 

O 

in 

CO 

Total 

CVJ 

s 

O) 

s 

3151500 

CVJ 

o> 

o 

s 

751850 

o 

i5: 

05 

S 

ID 

ur> 

s 

o 

CO 

in 

CO 

in 

in 

O) 

o 

N 

CO 

CD 

in 

§ 

55 

CO 

in 

CO 

o 

5 

8 

in 

O 

CO 

s 

h- 

o 

CO 

1 

CO 

h«. 

CO 

m 

o 

CO 

m 

S 

CO 

in 

CO 

CVJ 

f>- 

CVJ 

03 

d 

d 

s 

CO 

00 

CD 

CO 

CVI 

CO 

CO 

1 

1633047 

CO 

CD 

CO 

O' 

C35 

(0 

CO 

Tf 

ir> 

CVJ 

14.7387 

7.0542 

4.4677 

s 

i 

in 

CVJ 

o 

03 

5 

in 

CVJ 

in 

CVJ 

w 

5 

27.1089 

CO 

1^ 

CO 

CVJ 

1 

1 

a 

CO 

CO 

CD 

30.0657 

20.7731 

CO 

1^- 

CVI 

CO 

in 

1 

1 

1 

S 

5 

in 

o 

CO 

CVJ 

o 

cd 

CVI 

1 

h*. 

o 

CVI 

o 

cvi 

CVJ 

o 

CD 

in 

cd 

1 

s 

in 

in 

12.8247 

CD 

CO 

h- 

CD 

1 

g 

CD 

CVI 

d 

o 

CO 

cd 

CO 

in 

in 

CM 

o 

o 

03 

s 

in 

P 

d 

2.3797 

1 

I 

CO 

CO 

o 

CM 

CJ 

CM 

cd 

s 

CO 

o 

CD 

1^ 

<o 

CO 

in 

Data 

h. 

o 

N 

co 

CVJ 

CVJ 

CO 

1870784 

895397 

567081 

rj- 

0> 

CO 

00 

CVJ 

CD 

CO 

CO 

CVJ 

CO 

o 

w 

r'. 

CVJ 

CO 

CD 

o 

CO 

S 

CD 

CO 

o 

"d- 

O) 

o 

i 

3372944 

CO 

CO 

CVI 

CO 

CVI 

1^ 

3764971 

3816247 

o 

CO 

h- 

CO 

CO 

CO 

CVI 

o 

00 

in 

CO 

03 

1623517 

i 

o 

CO 

in 

o 

CD 

CO 

Cd 

S 

w 

CO 

CD 

CO 

CD 

in 

h- 

in 

in 

CO 

2787702 

2795094 

00 

CD 

CD 

Si 

CVJ 

CD 

CD 

CD 

CD 

O 

O 

h- 

ro. 

CO 

CO 

CO 

CD 

CO 

CD 

CD 

CO 

CVJ 

CO 

o 

CO 

CO 

CO 

CO 

d 

5 

00 

o 

J^ 

CO 

h- 

CO 

1301767 

m 

CO 

CM 

CO 

CM 

CM 

03 

CM 

CD 

CM 

in 

o 

CM 

O 

CO 

1433607 

CO 

CD 

CD 

W 

h- 

385706 

CO 

CO 

CO 

h«. 

CO 

CO 

CO 

o 

§ 

'h- 

CO 

CO 

CO 

CO 

CO 

^5 

10.9383 

CO 

o> 

CO 

O) 

s 

r^ 

4.6298 

17.0439 

R 

CO 

CO 

tn 

CO 

CO 

CVJ 

CVJ 

CVJ 

CO 

CO 

CO 

cvj 

CO 

in 

Tj- 

CD 

in 

CO 

o 

CO 

CVI 

cd 

o 

in 

d" 

CVJ 

cvi 

5 

s 

14.7094 

O 

N 

CO 

6 

s 

55 

cd 

11.2502 

CD 

d 

CD 

CO 

a 

CO 

d 

is 

s 

cd 

CD 

O 

O 

d 

CD 

CO 

CO 

in 

od 

CO 

rr 

CD 

d 

cvi 

O 

o 

r-' 

m 

o 

d 

d 

o 

CO 

h- 

CD 

o 

25 

CO 

03 

CD 

CD 

cd 

S 

00 

h- 

CD 

CO 

o 

CD 

d 

00 

CO 

d 

CM 

od 

CM 

CM 

cd 

in 

d 

d 

cvj 

o 

CO 

CO 

CM 

cd 

s 

O 

in 

CM 

CO 

CO 

cm' 

CM 

CM 

CO 

In! 

CO 

in 

CM 

cd 

h- 

CM 

CM 

cd 

Write 

r^' 

a> 

w 

O) 

CO 

327692 

278932 

166002 

611117 

560701 

o 

o 

o 

in 

CVI 

CVI 

CO 

5 

TT 

1 

1 

1 

fo 

CO 

CO 

§ 

CO 

o 

CO 

CO 

CVJ 

CO 

CM 

CD 

CO 

CD 

295691 

CO 

o 

CM 

CM 

CM 

76892 

03 

03 

CO 

8 

r«- 

CO 

CO 

in 

CM 

CM 

CO 

o 

CD 

CD 

CM 

in 

in 

CM 

1 

w 

CM 

CD 

CM 

in 

s? 

<o 

Tf 

TT 

CT 

16.9431 

CO 

00 

CO 

CD 

00 

CO 

o 

o 

CO 

lO 

CO 

29.7279 

CO 

CVJ 

00 

CO 

CO 

CO 

CO 

in 

CO 

CO 

CO 

CVJ 

1^ 

in 

i 

o 

03 

cq 

cvj 

CO 

CO 

c- 

? 

in 

•d- 

CO 

oo 

in 

cd 

CO 

s 

CO 

CO 

CO 

o 

CO 

cd 

CVI 

17.6971 

CD 

CO 

in 

d 

oo 

o 

in 

o' 

CO 

CO 

tn 

CD 

CVJ 

cd 

CVJ 

w 

CVJ 

00 

od 

tn 

hi 

in 

cd 

CO 

CD 

h- 

O 

03 

cd 

CVJ 

d 

CO 

cq 

r^l 

CVI 

co 

q 

d 

GO 

CO 

CVI 

CD 

CO 

P 

in 

CO 

CVJ 

q 

d 

o 

w 

cd 

g 

CVI 

CO 

K 

CO 

q 

cd 

CO 

5 

CVJ 

in 

1132844!  12.4386 

9355441  10.2723 

CO 

CM 

CO 

d 

d’ 

"cd 

o 

s 

o 

d 

CO 

CM 

h- 

cvi 

o 

CO 

In 

CM 

CM 

1136616!  12.4800 

473331  5.1972 

o 

CD 

r««. 

cd 

CO 

CO 

p 

in 

CM 

CD 

CO 

d 

5 

CO 

CM 

cd 

cd 

CO 

CO 

in 

29093428 

9107479 

3585537 

12693016 

41786444 

Read 

O 

5) 

<0 

CO 

CO 

CVJ 

1543092 

616465 

03 

o 

o 

h- 

05 

W 

fe 

CVJ 

2707461 

~o 

CO 

h- 

CVI 

¥ 

s 

o 

in 

CO 

CJJ 

CO 

h- 

CD 

O 

CO 

CO 

CD 

CVJ 

CO 

w 

oo 

d- 

3325923 

CVJ 

CD 

CVJ 

CVI 

o 

2109320 

1611756 

1326765 

CO 

CO 

r- 

oo 

1^ 

CVJ 

00 

CO 

s 

s 

cvi 

d 

o 

CVJ 

in 

in 

8 

CO 

CVJ 

CO 

g 

o 

CVI 

CD 

CO 

s 

CD 

CO 

W 

CD 

CO 

CO 

in 

CO 

GO 

CD 

tn 

o 

CVI 

s 

r- 

CVJ 

CO 

976416 

CO 

CO 

1463964 

1389328 

289524 

1422437 

925889 

Reference  Statistics: 

w 

o 

o 

c 

(U 

7.7912 

4.4021 

2.0132 

0.6351 

? 

cq 

1 

CO 

CO 

CO 

co' 

TT 

00 

in 

O 

CO 

CO 

rj; 

CO 

r- 

CO 

■d; 

cd 

o 

in 

cq 

d- 

1 

h- 

o 

CO 

in 

CD 

CD 

O 

cq 

d 

in 

d 

d 

CD 

CO 

CO 

CD 

€0 

d 

cd 

o 

cd 

CO 

in 

P 

cd 

co 

P 

cd 

CO 

5 

CD 

cvj‘ 

CVJ 

in 

cvi 

g 

s 

3 

2.8544 

O 

S 

P 

d 

o 

cvi 

o 

o 

r>s. 

in 

1 

1.6982 

CO 

CA 

CM 

1 

5 

P 

« 

CO 

CO 

cd 

1 

in 

in 

n 

d 

o 

d 

1^ 

i^ 

00 

d 

CO 

CO 

CD 

in 

d 

0.3067 

DC 

c 

.2 

o 

3 

to 

c 

« 

o 

H 

Data  Reads 

Data  writes 

Total  Data  References 

Total  References 

(A 

O 

Inst 

2266718 

1280716 

585695 

184769 

2291526 

2447757 

2380794 

1705289 

1887001 

1876162 

1352894 

1527836 

1545051 

1399372 

1147585 

1015164 

CO 

o 

d 

CO 

CO 

876625 

661531 

754732 

828884 

830442 

570528 

412257 

614151 

456770 

305356 

494059 

375780 

245684 

401812 

242081 

123239 

283848 

179060 

99211 

248650 

172615 

89232 

07 

(A 

(A 

S. 

Cache 

o 

CVI 

CO 

d 

m 

CO 

CVJ 

Si 

in 

CVI 

CO 

CVJ 

CVJ 

CO 

CM 

CD 

CM 

o 

CO 

CO 

CM 

CO 

CO 

CO 

s 

in 

CO 

CD 

CO 

r- 

co 

CO 

CO 

1 

58 


Table  18:  Espresso  w/  Operating  System,  Combined  Data 


-'?i 


CM  o 
CM 


159 


151878  0.0151  962150  0.4096  321148  0.5061  1283298  0.4301  1435176  0.1100 

512251  0.0509  79690641  3.4012  1217965  1.9195  9207049  3.0861  9719300  0.7446 

342703  0.0340  3731420!  1.5886  748628  1.1798  4480048  1.5017  4822751  0.3695 

1482991  0.01471  1509227!  0.64251  3338861  0.5262|  1843113  0.6178  1991412  0.1526 


Table  19:  Alvinn  w/  Operating  System,  Alvinn  Data 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

j 

1 

1 

1 

1 

1 

1 

1 

CO 

■M- 

CO 

CO 

CO 

CO 

CD 

CO 

TT 

■M- 

CO 

CO 

CO 

in 

■M- 

CO 

o 

CO 

CO 

CO 

1 

14388822 

140702172 

116063349 

■ 

■ 

49206071 

CO 

CO 

CD 

O 

5 

103759966 

90194857 

100977616 

75465602 

48977081 

55093817 

76856480 

z 

CD 

to 

to 

Z 

CM 

31113836 

80664799 

63124950 

60682823 

S 

CO 

CO 

z 

to 

32672837 

31279426 

41997893 

17236593 

z 

Tf 

h>- 

lO 

CD 

37091941 

29861401 

!  29125298 

25396402 

15469490 

14687685 

33290612 

8363275 

7353351 

s 

s 

(0 

lo 

So 

TT 

CM 

CO 

CO 

CO 

CD 

CO 

CO 

o 

CD 

rv 

o 

CM 

CO 

CO 

CD 

CO 

iS 

CD 

CM 

CD 

CO 

CO 

CO 

to 

CM 

CM 

23385504 

h- 

co 

CO 

CO 

to 

CM 

17634903 

19632409 

to 

o 

5 

CM 

CO 

CD 

CD 

to 

CO 

CO 

CO 

CM 

CD 

O 

to 

16739279 

17432768 

CD 

CD 

CM 

CD 

CM 

CO 

to 

12606336 

CD 

W 

W 

N 

CO 

CM 

CD 

CO 

CD 

CO 

CO 

h- 

§ 

CO 

to 

11522410 

CD 

CO 

to 

CO 

CO 

CO 

CM 

CO 

CO 

CD 

s 

CO 

12406282 

to 

CD 

to 

CM 

CO 

CM 

CO 

CD 

S 

5 

■M- 

10102977 

7583898 

4156254 

§ 

CD 

CD 

CO 

00 

w 

00 

CO 

00 

CO 

5185574 

CO 

o 

CO 

CO 

h- 

CO 

to 

3213824 

CO 

o 

o 

5929902 

1.2114  445411  0.0914  17586931  0.9244  18370949  0.2575  2901456 

1,1147  245530  0.0504  16019029  0.8420  16159176  0.2265  1471488 

2.5315  1064827  0.2185  36885882  1.9389  38430878  0.5386  5140263 

0.7327  457435  0.0938  10824788  0.5690  11691848  0.1639  3328570 

0.6228  259603  0.0533  9072731  0.4769  9276245  0.1300  1922891 

"o 

<D 

LO 

q 

"cm 

o 

(0 

d 

To 

o 

5 

d 

O) 

CO 

CM 

d 

d 

CO 

CM 

cvi 

cb 

s 

q 

o> 

to 

CO 

CO 

CO 

s 

CM 

CM 

o 

I"- 

CM 

O 

CM 

CM 

O 

CD 

d 

to 

CD 

S 

d 

z 

CD 

(£> 

o 

CO 

CO 

CO 

to 

1.2499 

o 

CM 

CO 

CO 

d 

z 

CD 

d 

1 .2386 

CM 

CM 

CO 

to 

d 

0.6025 

CO 

'M- 

O 

q 

O 

O 

O 

q 

CM 

z 

d 

z 

z 

d 

CM 

z 

to 

d 

CO 

CO 

CD 

rr 

d 

0.7146 

5 

CO 

CO 

d 

CD 

o 

CO 

d 

CO 

CO 

s 

d 

0.4635 

o> 

h- 

CM 

rr 

d 

31326308  0.4390 

1 

« 

O 

w 

h- 

CO 

CO 

o 

h- 

r- 

5 

CO 

CO 

CM 

CO 

CO 

CD 

to 

CO 

CO 

CD 

CO 

lO 

h«. 

CO 

to 

CO 

CO 

to 

CO 

CO 

CD 

to 

CO 

I 

0> 

h- 

CO 

CO 

to 

CD 

to 

to 

CO 

■M- 

r^ 

CD 

i 

CO 

z 

CM 

z 

to 

O 

CM 

CD 

W 

CM 

CO 

§ 

00 

to 

o 

00 

to 

CD 

CO 

eo 

to 

CO 

to 

CM 

CO 

CD 

CO 

Z 

00 

CD 

O 

O 

o 

z 

CM 

CO 

66656897 

CO 

CD 

CO 

CO 

CO 

00 

oo 

CO 

s 

CM 

z 

to 

CM 

to 

CM 

CD 

CD 

CM 

CD 

00 

o 

§ 

CO 

CD 

to 

lo 

CO 

CO 

CO 

CO 

s 

CD 

CM 

S 

z 

o 

CD 

CD 

CO 

to 

CM 

O 

z 

CD 

to 

CO 

CO 

50987910 

5 

s 

21757031 

43765564 

o 

CO 

CM 

in 

h- 

o 

CO 

CO 

30532405 

1 

sS 

00 

CO 

CFl 

CO 

CM 

ci 

CM 

5 

CO 

CO 

to 

CO 

d 

CM 

CO 

CD 

1^* 

CD 

to 

CO 

CO 

CO 

CM 

cd 

CD 

CO 

CM 

CO 

cd 

CO 

o 

s 

cd 

CM 

h- 

r«. 

to 

cd 

CO 

h- 

CD 

O 

a 

cm' 

ts. 

OT 

CO 

CM 

to 

5 

q 

to 

CD 

to 

CM 

r- 

id 

N 

o 

q 

•M- 

r««. 

CO 

to 

CD 

cvi 

CD 

rr 

CD 

cd 

4.3532 

o 

z 

O 

CO 

CD 

q 

CO 

o 

s 

•M- 

CO 

to 

cd 

o 

r^ 

co 

CO 

cd 

CM 

00 

o 

CD 

cvi 

CO 

rt 

q 

CO 

h- 

o 

q 

2.4706 

z 

00 

CO 

tn 

o 

CO 

o 

r«. 

cvi 

CO 

o 

eo 

CO 

o 

CD 

CO 

CO 

CO 

to 

1 

eo 

Q 

h- 

to 

to 

to 

CO 

CO 

O) 

s 

CO 

CO 

CO 

rr 

o 

lO 

CM 

CO 

lO 

o 

CO 

CD 

CO 

CO 

CO 

148355456 

CO 

to 

o 

to 

CO 

CD 

CM 

CO 

CO 

h- 

s 

CO 

120365678 

74971587 

CO 

s 

o 

CO 

CO 

Si! 

Ci 

CD 

CM 

O 

to 

CO 

56649929 

51208361 

111512255 

CO 

CO 

z 

CD 

CD 

C4 

CD 

CO 

o 

CD 

o 

o 

to 

CD 

5 

o 

to 

CO 

z 

o 

to 

CD 

CD 

to 

o 

8 

o 

eo 

CO 

o 

CM 

00 

CM 

CM 

CO 

to 

CO 

CO 

CO 

CD 

CO 

LO 

CM 

CD 

r- 

CO 

to 

CM 

O 

!S 

CO 

CO 

00 

CM 

00 

CO 

CO 

LO 

o 

CO 

CO 

CO 

h- 

z 

o 

z 

CD 

CM 

CD 

to 

CM 

CO 

to 

to 

to 

CO 

CO 

s 

CD 

CD 

CO 

CM 

h*. 

CO 

00 

eo 

CO 

z 

00 

8 

E 

O 

h- 

CM 

CM 

CO 

to 

CM 

20047925 

41297637 

32537744 

30512225 

!  29273893 

1 

CO 

h- 

CM 

CM 

6 

to 

CO 

CO 

d 

s 

CO 

o 

d 

CD 

O 

O 

d 

CO 

eo 

■M- 

d 

to 

CO 

CO 

CM 

d 

CM 

CM 

CM 

d 

CO 

CO 

CD 

d 

CO 

CO 

to 

CM 

d 

d 

CO 

CO 

to 

d 

s 

Si 

d 

<0 

CO 

o 

CO 

t^ 

o 

CO 

d 

CO 

CO 

CD 

d 

CD 

Z 

d 

OO 

to 

s 

d 

CO 

CD 

CO 

d 

0.1353 

0.3608 

CD 

O 

d 

CO 

CD 

d 

CD 

CD 

to 

CM 

d 

CO 

CM 

CO 

d 

0.0847 

to 

o 

CD 

CM 

d 

CD 

'<T 

CM 

d 

CM 

CO 

o 

d 

0.3219 

O 

CM 

O 

d 

z 

CO 

o 

d 

CO 

m 

d 

to 

CO 

o 

d 

•5 

m 

CO 

q 

o 

CD 

CO 

CO 

d 

1 

1 

Write 

CO 

CO 

o 

CO 

o 

h- 

CO 

N 

s 

CO 

CO 

lO 

CO 

(D 

CO 

lO 

CD 

CD 

CO 

CO 

CO 

to 

CO 

CM 

o 

CM 

CO 

CM 

<0 

o 

CM 

1 

CD 

s 

CO 

Si 

CO 

Z 

to 

CO 

CM 

CO 

CO 

N. 

CO 

CO 

00 

h- 

to 

CO 

? 

CM 

s 

s 

CO 

to 

o 

s 

s 

CD 

CD 

to 

CO 

z 

CO 

to 

to 

CO 

CO 

CO 

CO 

CO 

z 

CO 

CD 

812097 

659410 

1758443 

"o 

CD 

N. 

CO 

CO 

CO 

CD 

S 

tn 

CM 

O 

CD 

CD 

CD 

CM 

1 

cvi 

CO 

CO 

cd 

CO 

r- 

h- 

z 

z 

1 

1 

1 

1 

1 

sS 

CO 

•M- 

TT 

O) 

O) 

CM 

CO 

CO 

o 

CM 

lO 

o 

CD 

CD 

CM 

CO 

d 

s 

cd 

CO 

CM 

cd 

<o 

q 

cd 

CD 

O 

CM 

to 

CO 

o 

to 

CD 

CD 

CO 

00 

s 

cd 

CM 

i 

cd 

CO 

r- 

r-. 

r-' 

to 

CO 

to 

CO 

cd 

CO 

CO 

z 

r>l 

to 

CM 

co 

to 

h- 

CO 

o> 

cd 

00 

CO 

ID 

CO 

CM 

id 

"oo 

o 

CM 

to 

CM 

z 

cvi 

6,0147 

4.7639 

CO 

CD 

'm: 

'Cf 

00 

CD 

o 

CO 

cd 

to 

to 

cvi 

o 

o 

cvi 

CO 

o 

CM 

cd 

o 

<D 

CD 

CO 

CO 

2.8582 

z 

CM 

cvi 

Tj- 

TT 

cvi 

o 

cvi 

5233222045 

1415013630 

s 

CM 

00 

1902442104 

7135664149 

s 

0) 

CC 

s 

■M- 

CM 

CO 

CM 

i 

o 

CM 

s 

o 

S 

s 

CO 

CO 

CO 

CO 

s 

h- 

1^ 

CO 

CM 

CO 

CO 

h- 

CD 

CD 

CD 

CO 

CD 

CD 

CO 

CO 

to 

CO 

f^ 

CO 

h- 

67221068 

CO 

CM 

CM 

CO 

CM 

CM 

CO 

55448795 

50151550 

CO 

CO 

CO 

o 

o 

>s. 

CO 

to 

o 

CM 

o 

CD 

CO 

to 

CO 

CM 

o 

CM 

CO 

O 

CM 

CM 

CD 

CD 

CD 

CM 

O 

CO 

CO 

to 

5; 

E? 

to 

to 

CD 

CO 

CM 

O 

CO 

oo 

to 

CO 

00 

to 

o 

5 

CO 

z 

CD 

CD 

to 

CO 

37390873 

85109423 

o 

CM 

O 

■M* 

CO 

CM 

CO 

z 

CO 

CD 

53909781 

36387738 

to 

CO 

CO 

o 

z 

45432561 

CD 

CD 

00 

z 

o 

CM 

19655828 

40443243 

32125862 

30338982 

28461976 

0.0150  17141520 

0.0027  15773499 

0.0295  35821055 

0.0166  10367353 

0.0039  8813128 

Reference  Statistics: 

CO 

o 

o 

c 

C) 

Total  Data  References 

to 

CM 

CM 

d 

h- 

lO 

CM 

d 

eo 

CO 

CO 

o 

d 

0.0139 

CO 

s 

CM 

d 

1^ 

CM 

d 

CO 

CO 

oo 

CM 

d 

to 

CO 

CM 

d 

to 

CD 

o 

CM 

d 

CM 

to 

W 

d 

Tm 

CM 

CO 

d 

CM 

CO 

d 

o 

CD 

CO 

d 

o 

to 

CO 

d 

"cm 

o 

to 

d 

"or 

to 

CO 

o 

d 

CM 

eo 

CO 

d 

CD 

CO 

CM 

d 

CO 

CM 

d 

CO 

CO 

o 

d 

z 

CD 

O 

d 

CD 

to 

CD 

O 

d 

CD 

h*. 

CM 

d 

Z 

CD 

O 

d 

0.0204 

0.0997 

0.0623 

o 

o 

CM 

O 

d 

1 

i 

0.0486 

hv 

CM 

CO 

o 

d 

CM 

h- 

o 

d 

CO 

o 

o 

d 

0.0004 

0.0392 

s 

CC 

c 

o 

o 

3 

w 

c 

“55 

o 

K 

Data  Reads 

Data  writes 

Total  References 

2 

C 

1 

1 

725741 

14878361 

14513801 

o 

h- 

o 

s 

s 

CD 

CD 

S 

CD 

O 

o 

'M- 

CO 

h- 

o 

“w 

CO 

o 

CD 

CO 

s 

CO 

CO 

CM 

CO 

to 

CO 

to 

oo 

CM 

to 

CO 

z 

oo 

CD 

CD 

~a> 

<D 

CD 

CD 

to 

CO 

s 

s 

'tf 

Ho 

z 

CM 

6748159 

5876298 

5562092 

oT 

CO 

CM 

CD 

CO 

LO 

CO 

CO 

CD 

CD 

O 

to 

z 

s 

CD 

CD 

CD 

CD 

CD 

OO 

g 

CO 

CO 

CM 

CD 

CO 

CD 

CO 

O 

CD 

CD 

CO 

to 

CM 

to 

3261854 

1047012 

3986302 

2542594 

CO 

o 

o> 

o 

r*. 

2467927 

537486 

20180 

2052415 

784018 

140147 

1544996 

867060 

TT 

tn 

o 

CM 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

I 

I 

CD 

CM 

r^ 

CM 

CO 

CM 

<D 

i 

30 

3l' 

32 

CO 

CO 

S: 

LniCDl 

cotcoj 

±1 

r«. 

CO 

CO 

CO 

6) 

CO 

160 


784018  0.0150  17141520  1.2114  445411  0,0914  17586931  0.9244  18370949  0.2575  2901456  15469490 

140147  0.0027 _ 15773499  1.1147 _ 245530  0.0504  16019029  0.8420  16159176  0,2265  1471488  14687685 

1544996  0.0295  35821055  2.5315  1064827  0.2185  36885682  1.9389  38430878  0.5386  5140263  33290612 

867060  0.0166  10367353;  0.7327  457435  0.0938 _ 10824788  0.5690  11691848  0.1639  3328570  8363275 

2035141  0.00391  6813128:  0.62281  2596031  0.05331  90727311  0.47691  9276245  0.1300  1922891  7353351 


Table  20:  Alvinn  w/  Operating  System,  Operating  System  Data 


1 

1 

1 

1 

I 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

c 

OI 

in 

cm 

CO 

o 

in 

03 

o 

in 

o 

in 

CM 

o 

in 

CM 

o 

in 

CM 

Si 

(D 

z 

CO 

s 

CO 

o 

in 

CO 

o 

tn 

CM 

in 

CM 

m 

CM 

CM 

m 

CM 

in 

CM 

in 

CM 

1 

1 

CO 

o 

o 

CM 

o 

CO 

o 

tn 

'§ 

m 

u 

c 

o 

io 

1^ 

00 

Si 

o 

00 

CO 

O) 

CO 

CO 

4^ 

Tf 

o 

CM 

CO 

r- 

z 

rr 

h- 

CO 

CM 

o 

CO 

in 

CO 

in 

CM 

CM 

s 

s 

00 

CO 

CO 

CM 

21158564 

CO 

CO 

CO 

z 

CO 

s 

CO 

CM 

CO 

CO 

CJ3 

CO 

CO 

C33 

O) 

s 

O 

CD 

<33 

03 

in 

CO 

CO 

<33 

03 

O 

<33 

o 

in 

C53 

Si 

C33 

CO 

h- 

CO 

z 

CO 

CM 

CO 

R 

C73 

w 

(33 

CM 

CO 

in 

Y- 

in 

CM 

CO 

o 

CO 

CM 

o 

p^ 

CM 

2 

CM 

p>- 

CO 

CM 

CD 

CJ3 

CO 

CO 

o 

CD 

in 

o 

CO 

CO 

CM 

CM 

in 

00 

CO 

ID 

CD 

CO 

CO 

CM 

CD 

S 

00 

p^ 

CO 

O 

O 

CO 

O 

g 

CD 

I 

CO 

CM 

00 

<33 

CO 

eo 

<33 

CO 

CM 

O 

O 

C73 

CM 

CO 

CO 

00 

in 

p^ 

<35 

CO 

in 

Tp 

g 

C73 

00 

<73 

CO 

CD 

P^ 

5 

CO 

CO 

CO 

Si 

CO 

in 

in 

CO 

in 

in 

CO 

CO 

p^ 

CD 

CD 

CO 

CO 

CM 

CO 

CM 

o 

tn 

C73 

CD 

o 

o‘ 

5 

<33 

CM 

<33 

tn 

m 

CO 

CO 

o 

<33 

CM 

O 

o 

N 

in 

CNJ 

tn 

CJ> 

o 

CO 

CO 

t^ 

o 

tn 

O) 

in 

in 

m 

CO 

CO 

r«- 

00 

o 

CD 

00 

4^ 

00 

CO 

CM 

tn 

CO 

Oi 

03 

in 

CD 

CO 

h- 

CM 

CO 

CM 

5 

CO 

CO 

03 

in 

CO 

to 

CM 

CM 

tn 

CO 

CJ3 

O 

CO 

CO 

z 

CO 

o 

5 

CM 

CO 

CO 

CO 

g 

5 

CM 

CM 

Si 

03 

o 

o 

CO 

Y— 

1 

CO 

fe 

CM 

CM 

CO 

s: 

z 

CM 

in 

<33 

CO 

CO 

r- 

h«. 

CM 

z 

tn 

in 

o 

CO 

CO 

CO 

CO 

CM 

CO 

h- 

p^ 

p«^ 

<33 

in 

CD 

CO 

o 

in 

<33 

CO 

<33 

CO 

CO 

m 

in 

C33 

o 

o 

i 

o 

CM 

CD 

CD 

<0 

CM 

CO 

Si 

CO 

O 

CO 

§ 

R 

O 

o 

Tp 

5 

<33 

O 

CO 

CO 

z 

z 

CM 

CO 

CO 

a 

CO 

CO 

<33 

Z 

r>. 

Tp 

03 

8 

Tp 

Tp 

o 

<73 

di 

<0 

CD 

CM 

tn 

CO 

o 

in 

CO 

eo 

CM 

cd 

z 

CD 

<33 

P^ 

<33 

o 

CO 

o 

CO 

CO 

<33 

CO 

O 

CM 

h- 

Tp 

Z 

S? 

C\J 

O) 

o 

oi 

in 

CO 

00 

CO 

CO 

h- 

z 

CM 

CM 

CO 

CM 

03 

? 

CO 

CO 

in 

TT 

d 

T- 

S 

CO 

in 

03 

o 

CM 

CM 

<J3 

CO 

a> 

C33 

z 

o> 

od 

m 

CO 

03 

cvi 

in 

CO 

CO 

CM 

d 

•T" 

o 

d 

O 

CO 

CO 

q 

cd 

5 

cd 

eo 

CM 

in 

s 

cd 

CD 

r^ 

o 

cd 

P^ 

cS 

tn 

<33 

CD 

in 

<33 

cd 

in 

p'- 

CO 

CO 

cd 

CM 

<33 

in 

CD 

CM 

Z 

q 

tn 

<33 

Tp 

CM 

in 

cd 

CO 

h- 

CD 

00 

CO 

CM 

P-; 

Tp 

in 

p«- 

TP 

cd 

in 

q 

<33 

<33 

CO 

q 

Tp 

CO 

o 

q 

Tp 

In 

<D 

CM 

00 

CO 

CD 

CO 

cd 

CD 

CO 

Tp 

in 

tn 

CM 

CO 

d 

z 

CO 

Tp 

cd 

s 

s 

« 

o 

z 

CO 

00 

w 

s 

C3> 

0> 

g 

00 

CM 

in 

CO 

CO 

CO 

§ 

CD 

C73 

s 

o 

in 

CO 

in 

s 

CM 

CO 

CO 

s 

CO 

a> 

CM 

CO 

CM 

z 

CM 

CO 

CO 

R 

in 

CM 

CO 

28070584 

CO 

CC3 

iS 

CO 

in 

CM 

<T3 

O 

R 

S 

CO 

CO 

CM 

CO 

CD 

C33 

00 

03 

CM 

o 

C33 

5 

CO 

CO 

CO 

CM 

23646166 

CO 

C33 

R 

tn 

CO 

CM 

in 

tn 

eo 

CO 

m 

24826976 

r^ 

in 

CM 

CO 

Si 

s 

CO 

CM 

CM 

in 

tn 

in 

1 

z 

CM 

00 

Z 

P'- 

eo 

TP 

C33 

tn 

Z 

<33 

C73 

z 

CM 

CO 

CO 

z 

in 

<33 

O 

CM 

O 

O 

O 

8 

Si 

00 

in 

m 

o 

in 

03 

CM 

z 

Z 

p^ 

CO 

<73 

o 

CD 

CO 

O 

in 

in 

in 

w 

z 

<0 

z 

cd 

CO 

CO 

CO 

ini 

p^ 

<33 

Pr* 

z 

N. 

<33 

<33 

CO 

<33 

in 

in 

<33 

z 

CO 

o 

Tp 

<33 

s 

R 

P^ 

<33 

CM 

co 

CO 

<33 

CO 

<73 

Tp 

z 

00 

CO 

CO 

o 

CO 

oi 

CM 

CM 

CM 

tn 

oi 

CO 

o 

in 

s 

CM 

in 

cvi 

o 

C33 

CM 

CM 

<D 

CO 

o 

tn 

in 

0C3 

CM 

CO 

o 

in 

CO 

O 

in 

q 

CO 

h*: 

CM 

Si 

in 

CM 

in 

in 

CD 

CO 

CO 

z 

od 

CM 

s 

CO 

03 

CM 

00 

m 

cd 

CM 

o 

o 

h- 

03 

o 

z 

tn 

CO 

CO 

h- 

<33 

in 

CM 

00 

s 

cd 

<0 

o 

CO 

tn 

h. 

C53 

CM 

hv' 

CM 

CO 

<33 

CO 

q 

CM 

CO 

O 

<33 

<33 

z 

o 

in 

CO 

O 

CM 

<o 

d 

<0 

in 

o 

q 

in 

in 

00 

<33 

CO 

cd 

CO 

in 

CO 

d 

O 

p^ 

p««- 

in 

CD 

CO 

CO 

Tp 

cd 

CO 

CO 

tn 

Tp 

CM 

<0 

CO 

Tp 

CO 

tn 

CO 

CO 

CO 

<33 

CM 

CD 

CO 

q 

Tp 

<33 

CO 

in 

q 

<0 

CO 

O 

d 

s 

o 

ca 

”<5 

Q 

CM 

o 

z 

in 

ct> 

CO 

o> 

r^ 

o 

CO 

oo 

CO 

CO 

CO 

TT 

CM 

z 

CO 

00 

CM 

CM 

00 

CM 

o 

CO 

CO 

o 

r«. 

CO 

CM 

CM 

in 

CO 

CO 

r%. 

CM 

CO 

r«. 

CM 

z 

CM 

h«. 

CO 

CM 

i 

CM 

CM 

L 

CM 

CM 

CO 

in 

z 

C3 

in 

o 

o 

tn 

CM 

CO 

CO 

CM 

<33 

S; 

m 

CM 

CO 

CM 

z 

h- 

<33 

CO 

CM 

CO 

in 

CO 

CO 

o 

m 

h- 

CD 

CO 

CO 

CM 

in 

CO 

Tf 

Si 

CM 

<33 

in 

CM 

z 

in 

TT 

IS 

tn 

CD 

CO 

CO 

co 

o 

o 

CM 

CO 

<33 

O 

CO 

1^ 

« 

P«- 

P>- 

O 

o 

CM 

r>- 

CM 

O 

CM 

CO 

CO 

o 

CO 

m 

p^ 

<73 

in 

h- 

03 

o 

o 

o 

<0 

o 

in 

CD 

CO 

CO 

CD 

s 

CM 

O 

o 

tn 

<73 

CO 

CO 

CO 

eo 

C73 

Tp 

CO 

<o 

CO 

co 

o 

CO 

O 

CM 

S 

P««. 

o 

Si 

m 

in 

CM 

CO 

pT* 

CO 

<33 

CO 

P«- 

O 

Z 

CO 

CO 

o 

§ 

CO 

CO 

d 

CO 

Si 

<33 

CD 

CO 

p^ 

in 

CO 

CO 

in 

CD 

CO 

CM 

m 

:  CO 

i 

•-P 

0^ 

z 

rr 

<ji 

oo 

CO 

C7> 

in 

03 

CO 

CO 

C3> 

C33 

03 

d 

CO 

03 

CM 

-M- 

C33 

in 

to 

CO 

r««-' 

CO 

IS 

CM 

s 

CO 

in 

CM 

CD 

<33 

in 

cvi 

r- 

CO 

CM 

■M- 

in 

S 

CO 

q 

Y- 

CO 

CJ3 

in 

d 

h- 

C33 

r*. 

cd 

C33 

z 

q 

CO 

CO 

CO 

CM 

,d 

C33 

CO 

z 

03 

CO 

oo 

r«- 

CO 

cd 

in 

CM 

CO 

CO 

p^’ 

CD 

CO 

<33 

d 

§ 

Tp 

cd 

s 

o 

CO 

tn 

o 

CO 

<73 

CO 

in 

CD 

CO 

cd 

tn 

00 

Tp 

<73 

cd 

Si 

00 

<73 

(d 

OO 

CO 

<33 

N 

Tp 

00 

tn 

Tp 

o 

cd 

CO 

in 

CO 

in 

cd 

o 

cd 

P««; 

z 

tn 

p««. 

Tp 

Tn 

CD 

CD 

CO 

cvi 

Tn 

CO 

Tp 

cd 

CM 

CO 

in 

cd 

Tn 

m 

CO 

iq 

o 

1 

"a 

s 

CD 

i 

1 

"cm 

tn 

o 

CO 

S 

"o 

00 

CO 

03 

in 

CM 

"F^ 

CD 

o 

03 

1 

"© 

CO 

in 

CM 

s 

<T3 

h- 

CM 

CM 

CO 

5 

"oo 

o 

CD 

CO 

CO 

CO 

CO 

CD 

CO 

CM 

CO 

"co 

o 

C33 

03 

in 

TO 

"r^ 

o 

in 

CO 

§ 

~<o 

eo 

CD 

CO 

CM 

CO 

C33 

CM 

03 

03 

O 

CO 

in 

CO 

in 

in 

hv 

o 

CO 

CO 

CO 

p^ 

CD 

CO 

CM 

CD 

Z 

in 

in 

CM 

"p^ 

w 

in 

m 

g 

"Z 

CO 

tS 

1 

~Z 

CO 

<33 

CO 

S 

CM 

p^ 

<33 

CD 

CD 

CD 

CO 

CO 

CO 

o 

m 

cd 

CO 

CO 

iS 

CD 

in 

<33 

O 

<D 

CM 

O 

CO 

03 

J 

z 

O 

in 

CO 

in 

Z 

p* 

o 

<33 

1 

CO 

Z 

o 

CO 

CO 

s 

CD 

Tp 

Tp 

<^ 

CD 

<33 

tn 

CO 

CM 

z 

<33 

Tp 

z 

Tn 

in 

p^ 

<73 

CM 

"<0 

p«- 

o 

CO 

03 

'a> 

CM 

o 

CM 

197365478  | 

04 

o 

o 

CO 

<0 

<o 

CO 

O) 

lO 

04 

86400062 

o 

5 

la 

CO 

h- 

CO 

00 

CVJ 

o 

CC 

CM 

00 

00 

CM 

00 

CO 

O) 

9511514  15.7441 

CM 

in 

CO 

q 

in 

CO 

CO 

€0 

tn 

<D 

CO 

1 

n 

CJ3 

in 

CJ3 

CO 

CM 

03 

■|n 

It 

00 

z 

tn 

o 

CO 

CO 

CO 

o 

CM 

19994443  33.0961 

19299145  31.9452 

23847634  39.4742 

20191087  33.4216 

19254334  31.8711 

27647107  45.7633 

z 

CD 

o 

CO 

CO 

CO 

CM 

CM 

OO 

in 

'O' 

CO 

CJ3 

z 

03 

CO 

CM 

CM 

16767809  27.7552 

13487383  22.3252 

10949622  18.1245 

p«- 

tn 

Z 

cvi 

CO 

o 

s 

oo 

00 

<33 

13387377  22.1597 

z 

tn 

<33 

CD 

tn 

CM 

CM 

tn 

p^ 

~Z 

TT 

z 

CO 

<33 

O 

z 

o 

CM 

16511427  27.3308 

nT 

<o 

p«- 

C73 

tn 

CM 

s! 

CO 

CO 

<73 

CD 

tn 

10587061  17.5244 

7521837  12.4506 

3989905  6.6044 

9761843  16.1585 

7704352  12.7528 

"cd 

CO 

Tp 

<33 

cd 

o 

CO 

CO 

Tp 

(33 

10302378,  17.0532 

uT 

oo 

Tp 

cd 

<33 

CM 

O 

PT- 

m 

<33 

<33 

z 

00 

q 

<0 

p««. 

00 

eo 

p^ 

<0 

CD 

6871424  11.3740 

z 

Tp 

O 

tn 

z 

s 

s 

CO 

To 

C33 

O 

CO 

cvi 

in 

o 

CO 

in 

<33 

<2 

To 

o 

00 

d 

Ps. 

<33 

CO 

CO 

CO 

p«- 

p^ 

To 

CO 

m 

m’ 

CM 

CM 

CO 

CD 

CO 

Reference  Statistics: 

Total  Instruction  References 

Total  Data  References 

"o 

in 

CO 

TJ- 

r>: 

CO 

q 

"cm 

o> 

CO 

q 

cT 

in 

00 

CO 

c> 

"eo 

CO 

1^ 

o 

CO 

CM 

z 

cm' 

CD 

'cr 

CO 

z 

CO 

csi 

in 

CM 

tn 

CO 

cvi 

CD 

03 

CO 

TT 

CO 

CO 

CO 

CM 

O 

CM 

cm' 

CM 

& 

z 

q 

CD 

tn 

<33 

CO 

d 

CO 

1^ 

o 

CM 

in 

CM 

5 

d 

« 

<33 

d 

TT 

S 

q 

"^1 

CO 

CO 

CO 

CO 

d 

"o' 

CO 

in 

<0 

d 

o 

q 

CO 

d 

o 

CO 

CM 

d 

CD 

O 

CO 

<73 

d 

eo 

00 

p'- 

CO 

d 

o 

CO 

CM 

d 

o 

in 

p^ 

d 

"cm 

CO 

CO 

d 

"p^ 

z 

CM 

d 

p^ 

m 

CO 

p^ 

d 

00 

d 

"q 

Tp 

o 

d 

03 

CM 

CD 

d 

"a 

m 

Tp 

d 

cn| 

*§: 

<u 

CC 

« 

CD 

a 

Data  writes 

Total  References 

ImIss  Statistics: 

CO 

jC 

"cm 

O) 

0) 

00 

00 

co' 

CO 

CJ> 

tn 

tn 

s 

CJ3 

CJ) 

CM 

CD 

CO 

z 

in 

CM 

in 

CO 

"in 

5 

CO 

CO 

o 

CO 

CM 

CO 

S 

g 

in 

CM 

o 

o 

z 

CO 

CO 

CO 

CM 

03 

CO 

CM 

tn 

w 

z 

CO 

z 

<33 

CO 

C33 

CM 

in 

CO 

CM 

m 

CO 

CM 

in 

"in; 

Z 

z 

"iCi 

in' 

CM; 

Oi 

co; 

CO; 

CO 

o 

i^ 

03 

CM 

CO 

03 

C33 

CM 

CJ3 

i 

R 

CO 

in 

z 

<o 

o 

CM 

CO 

<0 

CO 

CO 

o 

CO 

"co 

o 

CO 

in 

CD 

in 

<0 

CO 

CO 

p^ 

<73 

o 

CM 

CM 

<33 

o 

CO 

"od 

<33 

in 

CO 

<33 

CM 

o 

CO 

p^ 

in 

CO 

tn 

CM 

<33 

i 

CM 

CO 

8 

S 

CD 

in 

<o 

CO 

CO 

CO 

<33 

CO 

<0 

R 

CO 

CO 

z 

in 

CM 

in 

tn 

<33 

CM 

in 

"o 

CO 

<0 

in 

Tp 

CM 

CM 

in 

1 

CD 

O 

o 

CM 

in 

Tp 

CO 

CM 

55 

CO 

"dv 

CO 

in 

CM 

CO 

lo\ 

tni 

ml 

i: 

To 

<D 

<33 

fe 

CM 

<1> 

JZ 

o 

CO 

O 

"cm 

"co 

Tj- 

"in 

CO 

CO 

03 

o 

"cvT 

CO 

tn 

"tdi 

N 

"cd 

"oT 

o 

CM 

CM 

CM 

CO 

CM 

Si 

tn 

CM 

<0 

CM 

Tv 

CM 

CO 

CM 

<73lO 
CM  CO 

cd 

"dT 

CO 

CO 

Z 

Tn 

CO 

161 


105590  0.0535  1817606  3.0086  187240  07205  2004646  2.3204  2110436  07437  636498  1471429  509 

976247  0.4957  9773521  16.1778  799544  3.0767  10573065  12.2373  11551312  4.0707  6410631  5140228  253 

299892  0.1519  5876335  9.7269  351855  1.3540  6228190  7,2085  6528082  2.3005  3199290  3326539  253 

128968  0.0653  3212333  5.3173  141441  0.5443  3353774|  3.88171  3482742}  1.22731  1559631 T  1922856 I  ^ 


Table  21:  Alvinn  w/  Operating  System,  Combined  Data 


162 


Table  22:  Compress  and  GCC  w/  Operating  System,  Compress  Data 


163 


17018  0.0196  3530748  15.7536 _ 37072  0.4350  3567820  11.5338  3584838  3.0385  567282  2365119 

56842  0.0653  3923724;  17.5073  107499  1.2615  4031223  13.0318  4088065  3.4651  490762  2709049 

22294  0.0256  3700435  16.5110  65792  0.7721  3766227  12.1752  3788521  3.2112  598070  2309659 

165151  0.0190!  3660335  16.3320 1  488261  0.573o|  3709161 1  1 1.99071  37256761  3.15^  637518  2236267i 


Table  24:  Compress  and  GCC  w/  Operating  System,  Operating  System  Data 


S 

S 

\VZV 

o 

in 

CM 

O 

in 

CM 

o 

in 

CM 

o 

CO 

o 

CO 

o 

CO 

CO 

o 

in 

CO 

o 

in 

CO 

o 

in 

In. 

O 

CO 

o 

1 

S 

c 

S 

w 

tn 

o 

s 

<o 

CM 

o> 

o 

CM 

CO 

CO 

CO 

CO 

In 

0) 

CO 

CO 

in 

CO 

CO 

CM 

In 

in 

CO 

CM 

0) 

CM 

CO 

05 

CO 

CO 

5 

05 

05 

CO 

co 

CO 

05 

O 

CO 

CO 

2 

c5 

D«- 

CM 

05 

o 

CD 

IN. 

CM 

fN 

CO 

|N- 

Tf 

s 

CO 

CM 

S 

z 

CO 

z 

N 

CO 

CD 

in 

s 

g 

CD 

O 

CO 

Z 

CM 

05 

CO 

CD 

00 

CO 

CO 

in 

CO 

in 

IV 

co 

00 

in 

CO 

CM 

CD 

in 

o 

CO 

CM 

o 

55 

05 

m 

IV 

CM 

5 

in 

Z 

o 

in 

IS 

CO 

iS 

CO 

o 

o 

00 

CM 

CM 

In 

xr 

IV 

to 

o 

IV 

z 

1.8699  797538  10.6785  200547  4.8208  998005  8.5830  1523564  3.8347  659135  566095  297826 

2.0078  1277802*  17.1089  261602  6.2885  1539404  13.2380  2103631  5.2947  1204806  486911  411582 

1.6505  1034358:  13.8493  188460  4.5303  1222018  10.5156  1686652  4.2452  736025  592591  357754 

1.4740  970269!  12.9912  173938  4.1812  1144207  9.8395  1558446  3.9225  593578  634887  329729 

H 

CO 

O) 

cvi 

o 

o 

00 

s 

O) 

5 

s 

g 

CO 

CM 

CM 

0) 

z 

CO 

o 

s 

z 

in 

o 

CO 

in 

0) 

0) 

Z 

S 

00 

0) 

CO 

-M- 

CM 

CM 

fe 

o 

o 

CO 

05 

CD 

In 

In 

o 

CO 

CO 

h- 

in 

CM 

CD 

CM 

CM 

CD 

m 

h- 

o 

CM 

CM 

CD 

CO 

CO 

o 

CD 

CO 

CM 

CO 

CO 

o 

CM 

IN. 

h- 

CO 

o 

CO 

in 

CO 

CO 

to 

CO 

In. 

O 

00 

CO 

CO 

CD 

CD 

CO 

CO 

CO 

CO 

CD 

CO 

CD 

CO 

to 

CM 

to 

CD 

in 

1^ 

IN- 

o 

o 

CO 

CD 

CO 

CM 

CO 

N 

S 

rN 

h. 

in 

CO 

in 

CD 

CM 

IV 

CO 

in 

CO 

00 

in 

CD 

CD 

CO 

CO 

5 

■M- 

|v 

CO 

IV 

z 

CD 

rv 

in 

CO 

00 

IS 

00 

CD 

CO 

CD 

|V 

CM 

CO 

i 

o 

in 

CO 

CO 

CO 

z 

in 

o 

CO 

|v 

CO 

55 

CO 

CD 

O 

CO 

05 

•M- 

CM 

■cr 

w 

CO 

in 

o] 

c 

CM 

CO 

S 

z 

CM 

CO 

CO 

O) 

U) 

00 

o 

CM 

CO 

CO 

in 

s 

0) 

CM 

CO 

00 

CO 

CO 

o 

o 

O 

CM 

CM 

CO 

o 

in 

CO 

CM 

CM 

O 

S 

CM 

IS 

in 

K 

5 

CM 

CD 

h- 

CO 

CM 

in 

C5 

s 

CM 

CD 

CO 

o 

CO 

CO 

CM 

in 

o 

CO 

CD 

CO 

CM 

CD 

s 

CO 

CO 

CM 

1 

CO 

05 

in 

in 

CM 

CD 

O 

CO 

CD 

CD 

CM 

CM 

in 

CM 

CO 

CM 

s 

5 

o 

o 

CM 

in 

55 

00 

CO 

CO 

CO 

IN. 

z 

CO 

CM 

IS 

IS 

CM 

CO 

z 

CO 

{S 

m 

CO 

CO 

IN. 

CO 

CO 

CM 

CM 

O 

z 

N 

CO 

in 

to 

S 

s 

CO 

05 

z 

CO 

s 

CO 

|v 

O 

z 

CM 

CO 

in 

m 

CO 

CD 

in 

o 

§ 

in 

o 

o 

s 

o 

in 

CO 

00 

CM 

CM 

CO 

o 

CM 

CO 

CD 

o 

i 

00 

CD 

O 

05 

rv 

CM 

CM 

CO 

0^ 

CO 

O 

CO 

CO 

CM* 

CM 

o 

o 

CJ> 

0) 

CO 

CM 

CO 

o 

h-' 

s 

CO 

TT 

■M- 

g 

CM 

N 

“2 

CO 

in 

o> 

5 

CO 

cq 

"M- 

•r» 

S 

CO 

o 

cb 

CM 

TT 

O 

cvi 

CO 

CO 

q 

CM 

Z 

in 

CO 

CM 

6 

CM 

CD 

CM 

CD 

CD 

o 

CD 

GO 

W 

CO 

CO 

h* 

Tf 

1 

CO 

in 

CD 

cb 

CO 

CM 

CD 

CD 

CO 

o 

0) 

CM 

CD 

CO 

CO 

CD 

05 

O 

CM 

CO 

cb 

o 

in 

CM 

CD 

in! 

55 

CO 

IN. 

CD* 

CD 

CO 

o 

CO 

cb 

z 

CO 

CM 

rT 

1 

o 

in 

CD 

cb 

CO 

CM 

cb 

in 

CD 

IN^ 

CO 

CO 

CO 

CM 

cb 

05 

z 

q 

in 

rv 

CM 

CM 

cb 

z 

in 

q 

q 

CD 

O 

CO 

tr 

in 

o 

CO 

q 

■M-' 

? 

CO 

ID 

CJ) 

O) 

CO 

o 

s 

CO 

0) 

CO 

CO 

CO 

o 

5 

CM 

§ 

CO 

r- 

in 

0) 

fc 

s 

CO 

00 

CM 

O 

CO 

CM 

0) 

If) 

5 

CO 

m 

in 

? 

r^. 

U) 

CO 

o 

CO 

N 

TT 

CO 

in 

CO 

IS 

in 

? 

CO 

CD 

O 

’M’ 

h- 

CD 

■M- 

CO 

to 

CO 

55 

§ 

s 

in 

in 

"tr 

CO 

s 

in 

CM 

fN. 

CO 

o 

CD 

CO 

CM 

CO 

in 

CO 

CD 

in 

IN. 

o 

CM 

00 

JB 

5 

CO 

05 

CD 

CM 

{; 

z 

CD 

in 

8 

CO 

in 

o 

IN. 

CO 

Tf 

o 

CO 

CD 

IN. 

CD 

1^ 

CO 

CO 

CO 

CO 

I 

CO 

z 

& 

Si 

in 

CO 

in 

o 

z 

CM 

o 

CO 

00 

oo 

Si 

o 

in 

Z 

CO 

CO 

CM 

CM 

CO 

CM 

CO 

Si 

|V 

CM 

CO 

CD 

s 

o 

00 

CM 

m 

o 

CD 

CM 

CM 

CO 

IV 

|V 

CM 

CO 

CD 

GO 

o 

o 

|V 

CM 

OT 

LD 

CO 

CO 

oo 

h- 

CM 

cvi 

in 

5 

<6 

in 

to 

o 

o 

CO* 

CM 

CO 

fs. 

in 

h- 

CO* 

CM 

o 

CO 

CM 

CM 

cb 

CM 

In 

CM 

05 

1 

GO 

CO 

CO 

CD 

CD 

CM 

Si 

CO 

in 

CO 

o* 

CM 

r- 

CD 

CM 

CO 

CD 

CO 

05 

CVJ 

18.7950 

O 

h- 

CD 

CO 

6 

CM 

■M- 

in! 

05 

CO 

CO 

cb 

CM 

Z 

CD 

CD 

CO 

CM 

CO 

O 

rT 

CO 

o 

in 

o 

cb 

in 

in 

m 

CD 

cb 

CM 

CD 

CD 

IN. 

cb 

CM 

o 

CO 

cvi 

rv 

05 

in 

o 

CO 

cb 

z 

CM 

CM 

cvi 

CO 

CO 

IV 

m 

in 

z 

cb 

z 

00 

|V 

cvi 

CO 

|v 

o 

00 

CM* 

CJ 

CM 

CD 

CD 

CO 

CM 

CO 

rNi 

|v 

CO 

CO 

CO 

cvi 

o 

55 

in 

CD 

■(5 

Q 

U) 

CJ 

If) 

If) 

CM 

If) 

CO 

CO 

00 

o 

CM 

o 

o 

oo 

h- 

CM 

'cr 

CO 

CO 

CO 

CO 

o 

i 

CO 

h*. 

in 

rS 

CM 

CM 

in 

o 

s 

CM 

CO 

in 

i 

CO 

CM 

CO 

CO 

05 

o 

CM 

in 

CM 

w 

s 

to 

o 

CM 

o 

in 

o 

CO 

o 

CM 

05 

s 

CM 

8 

1^ 

s 

T“ 

to 

CD 

s 

in 

CM 

CO 

CM 

CM 

CO 

CO 

o 

00 

CD 

CM 

IN. 

z 

CD 

CO 

in 

CD 

CO 

i 

s 

CO 

s 

CO 

CO 

CO 

in 

o 

o 

•M- 

CO 

CM 

CO 

o 

in 

CO 

o 

rv 

oo 

CM 

in 

CM 

CM 

•M- 

bi 

■M- 

in 

o 

o 

CO 

CO 

TT 

O 

CM 

in 

w 

CO 

CD 

S 

CO 

00 

z 

CD 

CO 

ny 

CO 

CO 

CM 

CM 

O 

"M- 

CD 

CO 

o 

z 

•M- 

05 

CO 

TT 

CO 

in 

CO 

o 

O) 

If) 

00 

CO 

CO 

CM 

00 

CO 

o 

CO 

CO 

s 

cq 

■M* 

In 

fs. 

c» 

"o 

CM 

h* 

■M" 

cb 

"in 

CO 

N. 

CM 

cb 

"FT 

to 

CO 

q 

in 

o 

CO 

cb 

CM 

05 

CM 

«! 

CM 

in 

CO 

cb 

CO 

o 

CO 

CO 

CD 

in 

o 

CD 

05 

CM 

CO 

to 

cb 

CM 

r«- 

CD 

CO 

o 

IN; 

s 

in 

cvi 

Z 

CD 

00 

6 

o 

CD 

CO 

CO 

6 

s 

CO 

CD 

6 

cb 

z 

in 

cb 

■M' 

IN. 

cb 

"o 

CO 

CM 

q 

■M- 

g 

CD 

0) 

IN. 

05 

6 

|v 

CO 

rv 

CO 

cb 

CO 

CO 

CD 

CO 

cb 

o 

CM 

ob 

CM 

CD 

O 

-M; 

cb 

“tr 

TT 

CM 

CD 

cb 

CM 

CO 

IS 

cb 

O 

CO 

0) 

<X5 

z 

tv 

tb 

1 

1 

IV 

If) 

CM 

TT 

i  in 

B 

i 

"co 

CO 

CO 

O) 

CO 

uo 

0) 

"in 

CO 

CO 

in 

i 

CM 

g 

"co 

c» 

CM 

CO 

¥ 

s 

CO 

CO 

o 

CO 

o 

s 

~C) 

CO 

1^ 

CO 

CO 

CO 

CO 

CO 

s 

00 

o 

CO 

'T- 

CO 

CM 

IN- 

o 

CM 

in 

in 

in 

"M- 

CM 

CO 

in 

CM 

? 

xf 

05 

IS 

-M- 

TT 

o 

IN. 

CO 

!CM 

s 

o 

5 

Hn 

s 

CO 

CD 

in 

’FT 

05 

05 

■M- 

CO 

IS 

CO 

§ 

CO 

in 

rv 

Z 

CM 

CO 

00 

CO 

7b 

CD 

|v 

IV 

CO 

CO 

7b 

CM 

CO 

05 

z 

7T 

in 

o 

CO 

00 

CM 

7b 

in 

CO 

CO 

rv 

CM 

7b 

in 

in 

o 

CO 

CO 

7b 

,Si 

CM 

CO 

CM 

to 

in 

|v 

in 

Si 

If) 

|v 

co 

CO 

o 

co 

'*05 

o 

|v 

m 

CM 

CM 

"co 

CO 

<d 

CM 

"co 

CO 

CO 

N; 

8 

s 

C?3 

ri 

CD 

CO 

CO 

CO 

CD 

iTC 

C5> 

;r^ 

CO 

O) 

CM 

"cm 

o 

cd 

CM 

CJ 

oo 

05 

in 

CM 

TT 

CD 

CD 

CO 

CD 

CM 

CO* 

CM 

CO 

CM 

O 

in 

in 

CM 

CD 

o 

■M- 

O 

CM 

CO 

In 

cb 

CM 

m 

CO 

CM 

q 

in 

CM 

in 

cb 

CM 

CO 

SI 

05 

6 

CM 

CM 

CO 

IN. 

05 

"ffl 

o 

m 

CM 

in 

CM 

s 

s 

CM 

7m 

05 

CO 

CO 

6 

CM 

CO 

CD 

0) 

Si 

CM 

CO 

CO 

IN. 

CM 

IN. 

CO 

•M- 

cb 

CM 

"o 

CD 

cb 

"in 

CM 

o 

q 

o 

O 

cb 

CO 

CO 

CD 

o> 

€0 

o 

CM 

CO 

in 

o' 

CD 

O 

q 

7m 

z 

q 

CD 

"O) 

CO 

CM 

CD 

cb 

00 

CD 

o 

11087931  14.8459 

"o 

00 

rv 

in 

cb 

r- 

CO 

o 

o 

05 

IV 

6646381  8.89^ 

1163339!  15.5763 

7b 

CO 

q 

cS 

s 

s 

00 

28102411 

CO 

to 

o 

eo 

(D 

S 

To 

o 

o 

o 

<D 

11628661 

OJ 

h- 

o 

o 

r««. 

o> 

CO 

k 

'a: 

Tf 

CM 

00 

Si 

00 

O) 

s 

s 

lO 

In 

CO 

\rr 

CM 

oo 

o 

"In 

0) 

CO 

CM 

1 

CM 

C3 

CM 

CM 

0) 

CM 

O 

i 

CM 

CM 

CM 

CO 

to 

CO 

CM 

o 

h* 

s 

CO 

h- 

in 

CM 

CO 

CM 

CO 

o 

s 

00 

CD 

h«. 

Tf 

05 

CO 

CO 

S 

CM 

CO 

CO 

h- 

CM 

CO 

in 

CO 

S 

s 

00 

in 

00 

00 

"FT 

o 

CM 

1^ 

m 

"in 

CO 

o 

05 

in 

in 

CO 

CO 

CO 

CO 

in 

in 

CO 

CM 

CO 

"o 

CO 

z 

CM 

in 

in 

CO 

m 

h. 

CO 

"iTr 

IS 

in 

o 

■r* 

o 

CM 

05 

CO 

05 

rv 

to 

in 

oo 

¥ 

CO 

00 

CO 

CO 

ro 

CM 

CO 

CO 

00 

O 

7m 

00 

z 

7b 

CO 

CD 

CO 

CD 

CM 

j 

iTv' 

CO 

CO 

05 

O 

CM 

Reference  Statistics: 

Total  Instruction  References 

Total  Data  References 

o 

CO 

00 

TT 

CO 

5 

CO 

"o 

CO 

CO 

"co 

in 

0) 

h* 

cvi 

"cm 

s 

•M; 

Po 

CO 

in 

CM 

"co 

cS 

•M- 

cb 

CD 

■M- 

O 

cb 

O) 

o 

CO 

o 

cb 

s 

in 

CD 

in 

in 

CO 

o 

CD 

in 

55 

a 

lb 

r- 

CO 

CD 

CD 

CO 

Tf 

TT 

00 

CD 

CO 

o 

CO 

IN.’ 

"co 

o 

o 

fN* 

cb 

o 

In. 

cb 

1n 

IN. 

CO 

CO 

in 

55 

00 

"o 

CO 

"M- 

CO 

in 

•M- 

CD 

CO 

IN. 

cb 

CM 

z 

O 

cb 

in 

-M- 

in 

z 

rv 

q 

'tf 

rv 

CO 

CO 

q 

-Cf 

7b 

o 

CO 

00 

cb 

7T 

CO 

CO 

CO 

cb 

7T 

CM 

CO 

CO 

cb 

"cb 

CO 

o 

cb 

CO 

CO 

in 

CO 

cb 

5 

CO 

tv 

cvi 

rv 

CM 

rv 

■M- 

cvi 

o 

CD 

55 

CM* 

"cb 

o 

o 

CM 

Data  Reads 

Data  writes 

Total  References 

Miss  Statistics: 

lo 

O) 

& 

5 

CM 

To 

o 

oo 

'M- 

To 

CO 

CM 

CO 

CO 

CO 

"in 

CO 

in 

CO 

h* 

"o 

o 

h. 

w 

CM 

O 

CO 

CO 

to 

o 

CO 

CO 

CO 

10) 

o 

"cm 

05 

S 

CM 

CM 

o 

r>. 

CO 

CM 

in 

CM 

CM 

1 

"in 

CD 

05 

05 

CM 

"FT 

CO 

o 

CO 

00 

00 

S 

IN. 

Tn 

CO 

in 

5 

to 

To 

CO 

CD 

8 

"M- 

7m 

CM 

CO 

in 

CM 

CO 

"cb 

CO 

CM 

CM 

CO 

CM 

7b 

CD 

o 

00 

o 

CD 

! _ 

"O) 

CO 

|N. 

in 

CD 

CO 

CD 

CM 

IV 

m 

1 

CO 

o 

rv 

00 

CO 

CO 

in 

IV 

|V 

CM 

CO 

CM 

"cm 

CD 

IS 

IV 

o 

"o 

oo 

m 

CD 

CM 

o 

"cb 

in 

in 

CO 

CO 

CD 

7T 

in 

CD 

55 

CO 

CO 

|v 

CM 

CO 

Z 

z 

rv 

IV 

|V 

CO 

z 

z 

CO 

00 

o 

CD 

rv 

o 

rv 

"in 

CO 

CM 

O 

05 

in 

CD 

iv 

z 

CJ 

in 

tv 

CM 

CM 

IS 

in 

CO 

CO 

CO 

'M' 

CD 

CO 

CM 

■M- 

o 

iX: 

o 

CO 

O 

"o 

“w 

"co 

In 

"o 

fT" 

CO 

1 

1 

1 

1 

1 

1 

1 

"co 

"FT 

"cb 

8 

W 

CM 

CM 

7b 

CM 

s 

7n 

CM 

7b 

CM 

7T 

CM 

“ST 

CM 

oT 

CM 

~s 

CO 

CO 

"cm 

CO 

CO 

CO 

s 

7n 

CO 

CD 

CO 

rv 

CO 

CO 

CO 

CD 

CO 

65 


525479  1.8699  797538  10.6765  200547  4.6208  998085  6,5830  1523564  3.8347  659135  566095  297826 

564227  2,0078  1277602*  17.1089  261602  6.2685  1539404  13.2360  2103631  5.2947  1204686  466911  411582 

463634  1.6505  1034358:  13.8493  188460  4,5303  1222818  10.5156  1686652  4.2452  736025  592591  357754 

4142391  1.4740|  970269 M 2^91 2 1  173938}  4.18121  1 144207T~9^95  1558446  3.9225  593576  634887  329729 


Table  25:  Compress  and  GCC  w/  Operating  System,  Combined  Data 


1 

1 

1 

1 

1 

1 

1 

o 

1 

1 

<0  c 

CO  T 
O  0 

T-  <1 

u 

3.9310 

to 

CD 

CO 

h- 

cg‘ 

8.8911 

1 

7.4061 

O 

1^ 

CD 

to 

to 

CD 

CO 

CO 

<0 

o 

CD 

CO 

00 

00 

00 

CM 

CD 

tn 

to 

e- 

CO 

CO 

CO 

CD 

to 

00 

U) 

6.7474 

5.7217 

5.3522 

CD 

s 

CM 

cd 

CO 

to 

o 

CM 

to 

to 

o> 

q 

6.5067 

5.0857 

0>  CO 
05 
CD  CO 

q 

4.1708 

3.6825 

4.4658 

»5.oyy/ 

3.5350 

o 

CM 

q 

■Ct* 

CO 

CO 

CO 

00 

cd 

3.5848 

00 

CO 

CO 

cd 

o 

z 

evi 

2.5248 

3.2733 

2.7739 

2.5760 

3.3170 

2.8443 

2.7247 

Total 

O  T- 
O)  0 
CO  ^ 
CD  C 

o  o 
CJ  u 
CO  T3 

10025733 

s 

CO 

o 

r'- 

CO 

CD 

r- 

CD 

CVJ 

CM 

CD 

O 

to 

03 

CD 

03 

CO 

00 

00 

00 

00 

00 

CD 

CM 

h- 

CO 

CO 

o 

CM 

CO 

o 

CD 

CD 

s 

to 

CD 

O 

CD 

h' 

s 

o 

o 

CD 

5 

CM 

S 

14943905 

CO 

CO 

o 

CD 

O 

CM 

e- 

i 

CM 

CD 

to 

CO 

CM 

to 

o 

to 

CO 

CO 

CO 

5 

CM 

to 

CD 

to 

to 

& 

Ri 

CO 

12730723 

o 

to 

o 

to 

CD 

to 

CO 

CO 

CD 

O 

a 

CM 

CM  O 
00 
to 

CM  CD 

to 

■I- 

CM  CM 

00 

CO 

to 

e-. 

CO 

CO 

o 

9391931 

CM 

OO 

CD 

OO 

CO 

T-  C 

yy^DtJU  1 

9015941 

11019700 

CO 

o 

CO 

in 

00 

fs. 

05 

00 

CM 

rf 

05 

o 

CO 

o 

m 

OO 

7100534 

1". 

00 

CM 

§ 

z 

8348326 

o 

CO 

o 

r'. 

o 

o 

o 

e- 

in 

CO 

CD 

CM 

CO 

CD 

tn 

z 

m 

o 

CM 

Z 

CM 

e*. 

CM 

z 

CO 

J 

CO  C» 

S  i 

CO  C£ 
lO  Cv 

o 

u 

eg 

<j> 

CD 

CO 

5 

O 

00 

00 

s 

CO 

to 

CO 

to 

CM 

o 

to 

to 

CO 

CD 

CD 

CO* 

s 

CO 

I''; 

f'- 

evi 

CM 

s 

O 

CO 

14.8296 

15.0276 

CO 

to 

CM* 

CO 

CM 

CO 

1"; 

to 

CM 

!'«. 

<o 

to 

CM 

to 

CO 

to 

evi 

t— 

h- 

to 

to 

o 

evi 

o 

CM 

CD 

CD 

O 

to 

CO* 

t-  to 

CO  CD 
lO  ^ 

■cr  o 
evi  T-^ 

9.8921 

CO 

00 

CD 

O 

CD 

-f-  0 

Tf  c 

O  t 
■M-  C 

^  c 

o  to 

D  h-. 
n  ■*- 
0  CO 
d  CD 

11.9253 

CO 

00 

d 

to 

CD 

CD 

CD 

CD 

f'«. 

d 

o 

z 

h-* 

CO 

00 

tn 

CM 

CD 

CO 

CO 

05 

CD 

o 

cd 

CM 

CD 

■M- 

cd 

CO 

o 

d 

h- 

e- 

CM 

h- 

cd 

CO 

CO 

CO 

<d 

Data 

SJ 

o  c 
to  C' 

CO  « 

e§ 

to 

CO 

,  T- 

I  CO 

5278501 

13010277 

to 

03 

CO 

to 

CM 

CO 

to 

CD 

1 

f'- 

s 

g 

CO 

CD 

to 

OO 

o 

CO 

o 

CO 

03 

to 

O 

15897197 

N. 

to 

CO 

CO 

to 

CO 

CM 

CD 

CO 

CD 

00 

to 

CO 

o 

CM 

CD 

O 

CD 

O 

CM 

CO 

UO 

s 

CO 

CO 

o 

CD 

CM 

CO 

z 

CD 

O 

h- 

s 

CM 

e- 

lO 

h- 

CD 

O 

OT 

o 

to 

CO 

CD 

CD 

h- 

to 

O 

Si 

CD 

O 

O 

9667895 

8950701 

8155374 

CO 

CD 

CD 

CD 

O 

iv 

6517871 

CD  C 
CO  C 
h«.  ti 
CO  c 
CD  C 

T-  c 
00  r 

6696978 

8571352 

7631987 

7187172 

6591^ 

5573198 

CO 

CO 

CD 

CO 

CM 

ID 

ID 

CO 

in 

CO 

CO 

CO 

o 

00 

CO 

CO 

CO 

CO 

m 

s 

e- 

CD 

CO 

ID 

ID 

TT 

CM 

CO 

CO 

CO 

CM 

h- 

6273059 

6097761 

S' 

ss 

h-  cv 
eg  c* 
N.'  tf 

3.4610 

1.8675 

10.8676 

9.3270 

00 

CM 

CD 

CO 

10.0892 

CO 

00 

to 

CO 

s 

CM 

O 

CD 

CO 

s 

CD 

O 

? 

5? 

i 

1 

h- 

q 

CO 

CO 

to 

CO 

cd 

CD 

OO 

OO 

CD 

CO 

CO 

to 

CO 

r'. 

id 

e*. 

to 

00 

? 

e- 

o 

to 

D  CO 
O  CD 
D  OO 
▼“ 

cd 

o 

CO 

o 

o 

id 

It 

CO 

q 

■*-  T 
CM  Q 
to  C 
q  c 

t  CD 

h  to 
>  o 

f  cd 

TT 

O 

q 

CM 

CD 

cd 

CM 

lO 

OO 

o 

cd 

OO 

CO 

CD 

CO 

cd 

CD 

O 

q 

evi 

CM 

00 

CO 

CM 

evi 

m 

CM 

cd 

CM 

CM 

CM 

m 

CM 

q 

'I— 

CM 

to 

O 

CO* 

o 

CO 

o 

evi 

1 .8882 

Write 

1511532 

11070AQ 

719052 

c» 

CD 

O) 

f'- 

CO 

CO 

2257874 

o 

03 

?>- 

CO 

CD 

o 

o 

CO 

CO 

to 

00 

2096145 

o 

CO 

I 

1459995 

CO 

CO 

CO 

w 

to 

CM 

CO 

CO 

1281435 

1770317 

ii 

1 

1 

CM 

CD 

ID 

e- 

CO 

h. 

CD 

o 

s 

I 

'>9 

S' 

18.6960 
15  fi69R 

11.5953 

CD 

O 

h- 

to 

o> 

CM 

q 

CM 

CO 

h. 

CO 

CM 

00* 

03 

CO 

o 

to 

1^* 

S 

CO 

CM 

evi 

CM 

s 

CM 

CO 

CD 

CO 

o 

00 

CM 

I". 

00 

CM 

5 

O 

q 

ej 

CO 

55 

CO 

CD* 

CO 

h- 

co 

r>-* 

14.7265 

13.9196 

CD 

CO 

CD 

cd 

15.3296 

CO 

CM 

to 

CD 

2 

CM 

00 

to 

q 

CM 

CM 

16.8566 

r-  CM 
D  CO 
D  Tf 
O  ^ 
d  CO* 

11.8779 

to 

CD 

I'- 

q 

d 

14.0274 

coin 

>  'T- 

•i 

i 

CO 

CM 

q 

g 

5 

CM* 

CO 

hv 

CD 

q 

00 

in 

q 

CD* 

CM 

t'*. 

00 

q 

05 

61491531  12.0337 

54031421  10.5738 

5172419!  10.1223 

ID 

CM 

CD 

CO 

evi 

o 

CO 

57054751  11.1654 

183169983 

51099459 

20776106 

71875565 

255045548 

Read 

9553552 

7955929 

5925133 

4890502 

10752403 

9319185 

8945913 

CM 

CO 

O 

OO 

1''. 

CO 

CO 

CO 

g 

o 

o 

9131611 

13892534 

11193406 

9377434 

9030865 

7525176 

7112860 

9604936 

CO 

CM 

CO 

CO 

CO 

CO 

7589464 

11527129 

o 

S 

CO 

s 

00 

OUlOCKtO 

6869416 

6069523 

5610486 

7167918 

AOOAIidC 

5989428 

7635582 

6880060 

6546196 

5823937 

5031137 

4745684 

6587981 

5840897 

Reference  Statistics: 

CA 

C 

o 

c 

o 

o' 

3.8989 

2.9807 

1.8461 

0.9745 

5.2771 

4.5095 

4.4163 

3.7469 

3.2582 

3.2137 

2.8624 

2.3851 

2.3394 

3.4983 

3.0834 

2.8485 

2.5595 

2.3295 

2.2196 

1.9177 

CO  T 
CO  0 
O 

<o  r 

^  T 

l./HO  1 

2.3280 

1.9258 

1.5691 

1.7432 

1  /lAOC 

1.2660 

1.3367 

1.1757 

1.0677 

1.0910 

0.8338 

0.6673 

0.8362 

0.6597 

0.5461 

0.6696 

0.5356 

0.4648 

0) 

CL 

d 

o 

o 

d 

CO 

c 

« 

o 

K 

Data  Reads 

Data  writes 

Total  Data  References 

Total  References 

In 

o 

Inst 

7141606 

5459773 

3381548 

CO 

to 

CD 

9666071 

8260134 

8089386 

6863121 

5968010 

5886479 

5243044 

4368833 

4285036 

6407881 

5647782 

5217620 

4688201 

4266958 

4065622 

3512693 

s  ? 

o  c 

CO  c 

o  c 
CO  c 
CO  c 

4264206 

3527555 

2874060 

3193103 

071Ad7A 

2318963 

2448348 

2153619 

1955644 

1998328 

1527336 

1222354 

1531741 

1208430 

1000247 

1226505 

981146 

851381 

CO 

<n 

<n 

S  _ 

Cache 

o  ^ 

eg 

CO 

to 

CD 

<0 

CD 

o 

CM 

CO 

'Cj- 

to 

CO 

CD 

CD 

O  T 
CM  C 

-  CM 

M  CM 

CO 

CM 

25 

96 

27 

CO 

CM 

CD 

CM 

o 

CO 

CO 

CO 

CO 

CO 

z 

in 

CO 

<0 

CO 

CO 

CO 

CO 

CD 

CO 

166 


Table  26:  Compress  and  Espresso  w/  Operating  System,  Compress  Data 


3011  0.0035  3433626!  15.3205  24092  0.2827  3457720  11.1779  3460731  2.9333  441366  2619103  400258 

>0201  0.0232  38337001  17.1056  93217  1.0939  3926917  12.6946  3947116  3.3456  427636  3029862  489614 

3082  0.0035  3589189!  16.0146  39686  0.4657  3628875  11.7312  3631957  3,0785  500643  2683794  1597469 

3210  0.0037  3534656!  15.7713  25632  0.3008  3560288  11.5094  3563498  3.0204  533372  2618974  411148 


Table  27:  Compress  and  Espresso  w/  Operating  System,  Espresso  Data 


r'.oji-ojoeooiCTi- 
Tj-t-ococor^cMOT- 
-^COTJ-OOiOOCOCM 
h.C4'^OOU><0’r-0 
CM't-r^oorrr^iooo 
CVJ  1-  CNJ  CJ 


(Dinotnoooor^ 

ooo)<£>T“COT-«r-rv. 

oa>oiCMO)inioo) 

r^O)CMC\iOT-coo 

O0)I^Tf<0<0l00> 


CVJCOW«>U5'*-TfCOOC\JO) 

<0'»-<Dir>oifth*coT-T-^ 

•^<Oh«*lf>COOCMCOIO'<fCVl 

CM  1^  «)  Tf  CO  d>  O  CM  00  CO 


u 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

740698 

231803 

227820 

775145 

160253 

124833 

384508 

150525 

101373 

362028 

100429 

65039 

434070 

101287 

491 7R 

823217 

747453 

660343 

662757 

669141 

207881 

641568 

538819 

495201 

531059 

454366 

400702 

488952 

447535 

412080 

CM  00  O  If) 
CJ)  00  h-  CD  If) 
W  O  O)  O  CD 
r^ooiOTr-T- 
TT  CM  CD  CJ)  5 
O)  ^  00  oO 

O)  CM  CM  CM 


0)  TT  O  ■»“ 

O  5  CO  1- 

C  CO  S'  CD 

2  CO  ^  IS  S  2 

®  g  oj  ^  S  $  S 

0)  C  is  “ 

I  CO^^  2  « 

I  (U  CO  0)  7 

;■■£  'a  "o  2 - 

i^QCCco^O'-CM 

I  2  ^  ®  ^ 

I  «  B  o  ^  ™ 

I  Q  K  h-  5  o 


Table  28:  Compress  and  Espresso  w/  Operating  System,  Operating  System  Data 


168186  1.0822  636537!  19.4053  140753  6.2633  977290  14.9020  1145476  5.1832  625095  428273  91856 

130049  0.8368  691857i  16.0491  112996  5.0282  804853  12.2726  934902  4.2303  333171  500608  100871 

1164371  0.74921  6626'84 115.37241  103300 1  4.59671  765984111.67991  8824211  3T929  261097  532430  88642 


Table  29:  Compress  and  Espresso  w/  Operating  System,  Combined  Data 


o<Dcocg<oo)<ooTf(Dtou)'i-ino 

OO<00)t0C0CMi-C0h.<DTj-C0h.o 

OOCDOr^COOOJCOinCO’T-OlO'r- 

5S’^K’wi^r^cMa>oo(oo«<ooco 

TT  CO  cvi  lO  ^  o’  U)  CO  CD  CO  tT  O 

'co'^'oc^u5"o'o'^~‘^'<D"o'c5”h^Tn 

<0CVJO^’»-O^-Ol^e0CM'i-KC\JCJ3 
h-O^cDCVJ^COh'.OO'g-COCOCDeDCVJ 
■—  ococooi^cDO)<ocoor'-^'<9-c»i^ 
C5'r-0>l0c00'^^00<0cvj000l00>00 
■SCOCOOCOlO-^h-CVJCReoCMlDh-CDCO 
L:T-0)<OTrLOT“OTroo)<D<r><j)<»“<x> 


coTj“c»cnc\jcDTroocM'g-r«.^coo 

'a-h-c3)cocooocj5^h--^<3r'.h«-<otn 

oOOOC3>0>0)Oh-'^h«.COO<MCVJO) 

J^<DIOO)Of^Or>.T-r«vCMCD'T-CD'^0 

wor^‘<Dwo'woooo>ir)c\ico-i-^ 

<DCOOOO)<J>lDI^U5CDCJ>CO<Or-0) 

<DCMi^<or^a>or^’^ocj)i^'«3-r^<» 

-coo-^rs.’f-ooTrcvj'^cDcDeooo 

iSi^oiOTi-o)o<D^oirjoc»coi-h- 


<NU)cDr^cjJCJ)Cvjioioicr'.ocDTj-ooo)0 
CO'r-0<DlDO'^f^OOr«-'»-<O^CMlOC\JO 
OC0-f-C3>0)O®00'«c3-OOOC^CD(33r«-'«3-<D 
oiOTri^cocMr^ocjc\jcj)coT-<orv.c>io)<o 
CO  cvi  ci  cj  cvi  c\i  c\i  cvi  cvi  DJ  -r-^  T-:  cvi  ^  <\i  r-: 

CO  0>  O^  "o  CO  11)  1iH"u2"^  C>  "o  "F^  1?r  "cD  ^  UoToTijD  "coc^ 
cooiD^-oh-.’^^rN.cy-f-iDcvjTr^ocMCvJocoTr 
our)U)iocoTfOO)C»’«j:Cvir^coO'?-'*-iD'q‘’i-u)o> 
2r<ocj)OT-c\Jcncvjc»)i?)o«)cDC\)cDcO'r-T-eocDTr 
CO^OCDCOh-OOCOI^  —  t-OJCVJCOOIO^Ot-O) 
u)tor^oo)'^u)''5’CJ)Tr'^oc»'r-cocj)OcoT-c\jO) 
CMcorN.cococoh«cDtDh-<DcDir)inTriou)Tr<DiO'^ 


-coo-^rs.’f-ooTrcvj-^cDcDcooo 

iSi^oiOTi-o)ocD^oirjoo>coi-h- 

CCOOCMCJ'^0)TrhwT-'r-<MOOO)0 

QCOI^lDTj-OCOCOOOCOOOeOCOh. 


OlOr«-0)r^OCDCOCOOCDCM»*CViO> 
coo)CMO)iococD'^cDr*.u)^eocDO> 
,c»ijf)eO'^oou)comcD'5-0)U)^m 
[CRCJJ-r^CVlT^-r^h^CDWOOcqOlOp 
O  csi  CNJ  CD  U)  CD  Tt  CO  rf 


r<*  05  u)  o)  Ur)  CO  CM  ■ 


r«-'T-inou)cococDir)ioo 

ICCOh'-COCDh-COCDOOO 

QjCDCDCDCMCDr«»OOOCMN 

.•=if-ir)oc5>'^eoocvjo<'r'eo 


CM  rv.  cj)  cj)  CO  o  CO 


SCO  o  o 
O)  to  o 

SCO  CO  CD 
U)  U)  CM 


CDUM-Ohi-CRh-h-Oh-CDOh-COr^CD 


rv  CM  in  CO  o 

CO  CD  T-  U)  O  CM 

o  -r-  o  CM 

rr  CO  00  IT)  CD  O 


u)  o  rr  •M'  lo 

O  IT)  -xf  o  ^  O 

O  D)  lO  U)  CM  U) 
CM  0>  r-  T-  r-“  •«■ 

5::sf3$es 

h-  CM  O)  00  CD 


CMCOCDCMCMTj'COOmoOCDOh- 

CMCM03Oir)OCD^00r>-O)l^-'t 

0)ir)r'.0)iococoT-^.Tr'xfcDio 

oeoococMx-'-r-iocococOflo-'tf 

CO'i-00xg-ir)ir)C0C0TJ-'M-CMT}-lO 

U)0)CJ><Dt0C0C0ir)C0CDC0O00 


lOCOU)lOir)'M'TfU)rr'M‘U)UOM- 


CM  T-  T-  CM  X-  r-  , 


OM-C))OOtOfl6CMOCMUr)'M-CD''-CMO)M 

u)^.r^ir)cocj>cj)COO'r-o)hp-coiooh 


o)  o>  CM  CO  CO  ' 

O  CO  CO  CO  00  -r-  I 

o  x-  T-  to  T-  r>.  o  ' 

r>.  CO  CD  CM  o  ^  I 
CO  CO  CM  CM  T-  I 
CM  IT)  CO  O  O  to  ^  < 

CO  CD  CD  O  CD  CD  I 


x-i-COlOC^OOx- 

TTCOx^tOOx-CO-tf 

•xrcOO'xt'r-CMO® 

ptOCMON-plOr- 

00C0O0)C0O0)0> 


COCMOCOtJ-CMCDM* 

OtDtOlOO)0<MM- 

COCMOJIOtT'.-COCM 

x“ococor«*o)r«>.-g' 

CDCO^'r-rJ-'r-'g-CO 

lOCOCMCD-M-incOCD 

•M--M-tO-xr’^ir)'M--M- 


ir)T-h-cDx-i^O)0)'«t“r^'r-iO'i-c))U)i 

cMri.cooocMocDCMaoTrtocDCDir)cD' 

_ooO‘^cooocDT-mocMeoor«-i^«)i 

S^f^x-incMTrCM'r*^r^O)C0'M-h.CDC0h»l 

T^T^OdCM*'r^i-’'r^OC>'r^o6'r^d' 

^’wl^'dl^’OT^’^’toT^CM'd'io'cor^" 

-r-0)h-'M'i-C0-*-C0x-'r~C0C0O'»-0) 

T-COCDCM'^TrCOC3)Tr'x}-COT-CM'«-CO 

OCOflOCOlOh-CDT-r-CDOCOCOOCDI 

K  cS'®C>®C>®^®®®O'M-C0h-U)' 

OiCOCMx-  locOCMCOx-t-CO'T-'f-CM'i-' 


CO  00  O)  to 
O  CO  CM  Tf 
^  CM 

IT)  ^  CO  rj- 

d  d  d  d 


O)  in  CM  CO  u)  CO  CO 

o)  fN.  CO  cn  CM  u) 

T-  CD  r«-  K  CO  x-  rs. 

U)  p  CM  p  CM  CM  CM 

d  d  d  d  d  d  d 


to  O  CJ)  o 

s 

CM  1-  CM  r- 


CO^'i^'xrCOr«.O)CMCDCMC0CD 

Or^tnCDCDOCOOOQOtOLOD) 

Oh.COCOO’M-inx-COCDOCO 

COCMCMCOCO-M-CDOin^OCO 

tOOCDinCOCOM-CDlOTfCMin 

T-o)CDO)r«-.oh-inh-.tn'M-in 


h.  h.  in  T- 

T-  U)  hv  TJ- 

Tt  CO  CM  h- 

O)  T-  TJ-  o 

CM  h-  'M-  r>> 
Xt  CO  TT  CM 


CM  -M-  CO  CD  ■*-  lO 
T-  fs.  o  O  00  CO  Cfi 
o  in  o  00  CO  CO  CD 

y-y-T-OT-OO 

d  d  d  d  d  d  d 


CO  CD  O  -r-  CD  CO  in 

Sto  'M'  to  O)  X-  O) 
O)  o)  CD  o  CO 

TT  r-  CM  CM  O)  o 

o  X-  o  CO  CO  rr 

CM  CO  CM  r-  CM  T-  X- 


CM  CO  M-  Uo  CO  h-  00  O)  o  • 


Table  31:  GCC  and  Espresso  w/  Operating  System,  Espresso  Data 


172 


35^67  0.1584  1021966  1.9987  203020  1.6781  1224986  1.9374 _ 1579753  0.5500  443086  681871  454796 

j51429  0.1122  632683  1.2378  138431  1.1443  771314  1.2199  1022743  0.3561  321370  456978  244395 

ig5774  0.1945  2205351  4.3131  254476  2.1035  2459827  3.8903  2895601  1.0081  422012  ^4590  1^8999 

329531  0.1471  1169631  2.3266 _ 185348  1.5321  1374979  2.1746  1704510  0.5934  451474  712621  ^0415 

252814I  0.1129{  736403  1.4402 |  12197l|  1.00621  8583741  1.35761  11111881  0.38681  3701431  503201  237844 


Table  32:  GCC  and  Espresso  w/  Operating  System,  Operating  System  Data 


c  52  <D  w  w  <u  i2 

2  —  oc  5  Q  cc  c/) 

w  «  «  <S  «  5  (ft 

a>  o  ®  cc  o  o  := 

£E  H-  Q  Q  K  »-  2 


m  w  vw  vv  ^  ^ 

<M(DI^O<DO>'r'Lf>lO'3-tt'^'»-’0>C2CO 

eor^tfitt)^e><ou>oa)^c>i<x>r«-<0(D 

0-^CJCOOC\Jt^lOr}-eOlJOh«.CO'^l^l>“ 

SO*5h-^COCVJOi-<MCO'^COr^3;04 
U5^<0lft(£>l0^l0TrOinC9CMVC0 


5i/>‘<<j-r^'r-<r-'i-0)dcMOtoeo»^r>-^ 
CO^CM'^-'-'r-OOOOCJCOCMr^O 
'T  'd;  0>  CM  TO  O  P  CO  TT  IT)  cq  ca  p  <£>  CO 

Tt  CO  1^’  CO  o  ed  rr  cvj  O)  <£>  U5  CO  cd  IT)  TT  05 

COCMT-t~^'.-l-T-'»“l-T~f-  1- 

«  T-  CM  cd  cd  in  CD  00  <0  CO  CO  id  o  CM  o  10 

<D<D'r-C0CJ5O>CMC0'»-'^(DCMO)<DO)Tr 

COU2CMO)CMC3)COO>T-OCMOOr«-COO)^ 

»-<DhNCMCDO)h«-COr«-lOtDCM“^lDinC7) 

COCMCMCOCDCDOCOOCOpCMCDOCOr;;. 

<oino>'it^05U)coor^i^-»a'eocoioo5 

CMCM'r--T-<i^i-T-r-CM-«-'»-'r'  t- 


^lOlCMICOlin  in  <J)  CO  <0  ^  CO  CD  r-  CO  2  S 

h-lcJ5lcMiin!<5  opcocopcJ50h-  CO  ^  S  S  S  5^ 
incoi^co^K'*-co'iP'  cM'co  o  cd  co  m  m 

<r*r^ppCMpppT^inppppt^CM 

inini^cdinin'^^'^  cd  cd  cd  cvi  cm  cvi  co 
In  "o  ”0  ^  ^  cot-t-cocd^coco 

cocMO^*^co'^cor'-<Din^co<35tn'^ 

TrT-OCMCMCMh'-'TCOCOCOCDOOCOI^in 
cM^ininincocM'0-in^h»TfcoTrinp 
i“0l^in'^C»CDr^Oh-<T5O'»“CMi;:;rr;;;- 
oococoooi^incococM‘^''-*cj>oco 

CMCMCMCMCMCO't-T-’i^'.-T-i-^  t- 


729664  1.6707  733630  6.8212  237469  4,2461  971299  5.9404  1700963  3.0728  794152  561174  325128 

877237  2.2491  1793502  16.6712  397861  7.1141  2191363  13.4023  3068600  5.5435  1704283  946851  417213 

731182  1.8746  1301099  12.0941  292038  5.2219  1593137  9.7436  2324319  4,1989  1031922  842595  449549 

609859  1.5636  1118895  1040051  ^258591  4.0386I  f^4754|  6  22451  19546131  3.53101  8326471  7498391  371S7? 


Table  33:  GCC  and  Espresso  w/  Operating  System,  Combined  Data 


174 


54451581  0.95181 


Table  34:  Compress  w/  Model,  n=:l 


1628  0.0000  3329522  0.1486  22173  0.0026  3351695  0.1084  3353323  0.0284  2998066  355022  235 

2145  0.0000  3677102;  0.1641  82525  0.0097  3759627  0.1215  3761772  0.0319  3573411  188236 _ 125 

1297  0.0000  3464412  -  0.1546  31646  0.0037  3496058  0.1130  3497355  0.0296  3301958  195253 _ 144 

1066  0.0000  3408615  i  0.1521  17548  0.0021  3426163  0.1108  3427229  0.0290  3227555  199525 _ 1^ 


Table  35:  GCC  w/  Model,  n=:l 


h!rr>.coo)U>in<oO' 


ht<»c»05Uo<c><ocoN- 


<£>  eg  O  0>  CO  <D 


o>  o)  o  r-  to  o  CO 


T-cococgoioor^r^ 

COCOOO)CDO><D<DO] 

<DCJO>tD<D<Ot-COCO 

cocDOJCO-^-g-cvjwh- 

COOCOT-lftcO-t-T-CO 


f-ooiooor'.r^r^o^cg 

COOOCO^CO<£>^h»CJUO<D 

'TOCO'^COCO'^COCDOJCOO 

SOTfvcocgoococof'.'f-’ 

CO'^lOr-lOT-'O-CDlfiOO) 

U5eocgcoco'fco<ou5ir>oo 

CDincOlOlOcO'^COUJCOCOU) 


OCT  >«r  lo  r«»  c\j  to  o 

O  m  CT  CT  O)  to  •«-  CO  (D 

O  hr-  CT  to  ^  r-  ■3-  to  lO 

to  CO  O  ^  O  CT  O  ^ 

O)  he  CNJ  r-  10  o>  o  CO  CM 

<0  ^  O  to  CO  to  CM  CO  to 

CM  CT  CM  ^  CM  ^  CM  'T- 


"e  O)  to  O  CT  CT 
CM  -e  ▼“  o  C»  CM 
05  10  to  CT  e" 
O  1-  1-  CM  CM  CM 
CO  O  <0  O  CM  O) 


OOI^'W^OOtDtDCMOOO’r-OOOtOtOCOCOi 
lOe-lOOTj-tOtOCTtOT-tOCM'i-O'i-’CMrvCMI 
OSOOtON-CTtOI^N.eO'eOCMCOOtOCM 
e’OtD'»-0)T-<DO*-C0CT05CT^CMCMCM0>< 
r-lOI^CTh-CMOCJ50CMCMCTlOO'e^O>»^l 
cM^co'e'er^cMcwcoCTOeotON.iOCTooe 

e^CMCM-eCMCM'eCMCMCM't-^CMT-i-CMt- 


05i-t0Ol0<005CMC0 

i-COCJ5CO<MO<3>OOCT 

■ecO’eT-CTcoio-ecD 

coco^05CM'ei-'eco 

h-.<Or«-CTi-COCTi-i- 

CM 

CM 

CO 

to  CO  *e  o 

05  05  to 

O  O  CO  CM 
CT  to  CT  CO 
1-  TT  CO  CM 

z 

CO 

lO-eCMCMr- 
O  CM  CT  CD  to 

o  CT  o  o 

CT  <35  to  to  to 

O  CT  CO  05 

coi-T-i^rs.cMOiocoiOTr-ecotocoi- 

C35CTCT<0t0c0i-r^<0O^<DO05CM00 

C35CMCDCTe’'»-e<005lOTriO'r-'e050 

CTh-T-CMh-0<3305COOh«-COCOOC330 

<35C0<DCM05i~CT05CMt005r^i-C35Ot0 

OCOCTi-COCMi-CDO 

CO  05  h-  CM 

B\ 

r-  o  to  e-  <33 

eCOr^tO^'tOCTCMlOCMi-CTCMi-CTi- 

<e  o)  e  CT 
O  CO  ^  CT  CT 
CO  CM  CO  ▼-  C35 
<3>h.’eCM'i- 
CTO)h-r-'i- 
CM  1-  O  CM  to 
O  O  05  05  0> 
CD  to  t-  CO  CM 


7.5851 

4.6474 

2.7635 

1.5206 

9.5823 

6.7884 

5.8232 

10.7232 

7.2812 

6.2010 

12.6553 

3807505 

2332868 

1387220 

763320 

4810068 

3407617 

2923067 

5382757 

3654970 

3112717 

6352631 

n 

mam 

CT  -r-  CM  T“  o  CO  CD  CO  O) 
^  C35  CM  05  10  to  to  CM 

TfOtOOTrT-<Dr«>-  N.  CM  to  CO  r-  1-0  CO  <0  N. 

T-r^m-eCTCMCToo  to  Nr  cm  cm  -e  to  to  to  o 

COi-lOCOOi-<DCM  to  h-  CO  h-  -e  1-  CO  rf  T- 

i-CTi-t-CMOCTCM  05  CT  h-  <0  TT  h-  tO  CO  h-  to 

CM1-1-CM1-1-CM1-  1“  1-  1- 


^  "S  C5  rt  ®  z: 
c  <2  2  •=  «  <13  5 
cj  =  (r  5  Q  cc  CO 


COCMCTCOi-tO^i-CDCTO) 
h-hvOCMCM'eCTOO)ir'r* 

<MCMlO<Ol^lOps.incMOrr 
CMCMr^coCT-etoCTCO-eh' 
CT05t0i-CD«eOO05C0O 
OCDlOOCOh-lO-eCMCTOO 
coi^h-coioioee-'eto'e 


lO’eiocMtoioooco'r-r*- 

CMcoocM-^toe'ecoco 

CT'ecMooCTcoi-Tr'eco 

U5e0T-T-lOO5<OO5COCJ5 

0503i-'e05i-005CMO 

CO-r-l^lOCTOr-OCOtO 


176 


5917081  0.65421  1137861}  0.4956 j  8320221  3056081 


Table  36:  Espresso  w/  Model,  n=l 


1 

1 

1 

1 

1 

1 

1 

I 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

i 

1 

1 

! 

1 

1 

1 

1 

1 

1 

I 

I 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

c 

CM 

CM 

40 

S 

s 

» 

Oi 

CM 

IO 

00 

CO 

to 

CO 

2853401 1 

3133356! 

CD 

CO 

CO 

to 

00 

CM 

CO 

£ 

oo 

S 

s 

oo 

CD 

CO 

CJ> 

8 

S 

h- 

CO 

Oi 

CO 

CM 

CM 

CO 

£ 

o 

r- 

00 

£ 

§ 

to 

o 

CM 

O 

Cft 

CO 

CM 

fe 

Oi 

CD 

CO 

4615125 

? 

Oi 

CO 

CM 

CM 

CM 

£ 

Si 

fe 

0) 

to 

CM 

£ 

<1— 

00 

s 

£ 

Y— 

CO 

rv 

o 

o 

£ 

CO 

xj- 

ID 

rv 

IV 

to 

CM 

ID 

fv 

fv 

CO 

CD 

CD 

5230733 

£ 

£ 

o 

CD 

CM 

o 

o 

T" 

CD 

CO 

O 

CO 

oo 

00 

o 

CD 

£ 

O 

1  1558218 

1  1816515 

1  1928837 

1  2665122 

CD 

O 

CO 

cO 

CM 

1  2994176 

1  1721717 

1  1881874 

1  1985115 

1  1085236 

1  1186321 

1  1271498 

o 

S 

i 

o> 

w 

00 

CO 

a> 

CO 

o 

To 

CO 

to 

CO 

CM 

TT 

CD 

00 

CO 

s 

o 

409243971 

272429801 

s 

fe 

<Ji 

CO 

CM 

CM 

w 

o 

00 

CD 

00 

fe 

o> 

£ 

CO 

a> 

CD 

CM 

CM 

“d 

CD 

to 

CM 

CM 

T- 

oo 

CD 

i 

o 

o 

'M' 

1“ 

CD 

CM 

CJ> 

CM 

CM 

to 

00 

CO 

CO 

00 

CD 

CO 

s 

Oi 

1 

to 

CM 

00 

i 

5 

00 

£ 

CO 

1— 

o 

to 

s 

Oi 

CM 

CO 

CM 

CM 

to 

to 

to 

£ 

o 

s 

g 

CD 

s 

CD 

to 

CO 

CO 

CM 

10056052 

rv 

Oi 

CM 

rv 

to 

CD 

CD 

oo 

CD 

CD 

00 

4431695 

IV 

ID 

00 

CO 

T- 

tD 

CM 

CO 

£ 

CD 

CD 

rv 

O 

CO 

rv 

o 

rv 

i 

1765093 

i  11149394 

i  3366527 

1  1537449 

1  4938441 

1  1223084 

1  329944 

1  5298258 

1  1216734 

1  250704 

1  6751450 

1  1912655 

}  283975 

1.84071 

1.07221 

0.61931 

It 

CM 

CD 

3.46501 

2.40431 

SI 

CO 

o 

CM 

ct" 

oy 

CO 

"cm 

OO 

CM 

D) 

h. 

to 

CO 

Oi 

CO 

CM 

CO 

£ 

cq 

00 

o 

to 

IO 

to 

s 

CO 

CM 

00 

CD 

O 

■M; 

CD 

CD 

£ 

00 

Oi 

to 

IV. 

CM 

q 

£ 

CD 

tv. 

d 

|v. 

rv 

£ 

CD 

£ 

CD 

d 

O 

£ 

CD 

d 

Si 

8 

1 

d 

0.6130 

o 

£ 

q 

T- 

£ 

ID 

d 

I 

oo 

ID 

O 

O 

1  0.4103 

£ 

CM 

d 

00 

o 

CD 

d 

1  0.3258 

0.2631 

r  0.5556 

1  0.2453 

o 

fv 

fv 

d 

1  0.6203 

1  0.2453 

1  0.1231 

Total  ! 

OJ 

<o 

40 

40 

CM 

s 

135471741 

78245881 

CO 

o 

h- 

5 

43778041 1 

to 

CD 

CO 

o 

CO 

To 

r*^ 

a> 

£ 

CD 

to 

CM 

Oi 

O) 

£ 

CO 

O) 

o 

CO 

I 

CO 

CM 

g 

CO 

CO 

0> 

LO 

o> 

i 

g 

CM 

CM 

to 

IO 

D*- 

h* 

Oi 

O) 

CD 

£ 

o 

Oi 

CO 

to 

o 

to 

CM 

Oi 

O 

CO 

CM 

00 

CD 

CM 

CD 

CM 

CD 

to 

CD 

O 

o 

00 

to 

CO 

rv 

S 

s 

CD 

CM 

CO 

CD 

CM 

L 

00 

CO 

00 

CD 

00 

CO 

00 

1 

1 

s 

40 

3.00051 

00 

CM 

00 

fe 

CM 

00 

d 

9.98551 

7.1646i 

5.9353 

00 

r- 

CO 

o 

d 

r- 

co‘ 

CO 

CD 

oq 

Oi 

s 

cq 

00 

£ 

■M- 

CD 

00 

to 

00 

CD 

fN. 

h«. 

CO* 

4.4498 

£ 

CO 

£ 

CO 

CD 

3,3273 

2.8615 

rv 

|v. 

CD 

TT 

CD* 

3.0297 

2.3837 

CO 

00 

CO 

£ 

CM* 

2.3437 

3.8292 

1  1.9514 

00 

fv 

£ 

1  3.8633 

1 

CM 

00 

to 

q 

il 

CM 

00 

CD 

CM 

1  1.3284 

o 

£ 

q 

1 

fv 

q 

CD 

ID 

CO 

fv 

d 

1 

© 

© 

© 

q 

Data  1 

144214551 

CD 

CO 

h- 

O 

!>. 

to 

00 

52235441 

1 

o 

CO 

CM 

28523251 1 

s 

CO 

to 

CD 

1 

16953920 

5 

to 

CM 

<0 

CM 

00 

oo 

o 

£ 

h- 

to 

£ 

£ 

oo 

CO 

32292129 

18552285 

CD 

oo 

r^ 

o 

£ 

■cr 

OO 

£ 

to 

CO 

O) 

£ 

00 

o 

£ 

CM 

s 

£ 

CO 

£ 

CO 

£ 

CD 

CM 

CO 

£ 

to 

CD 

CD 

£ 

CO 

|v. 

£ 

CM 

00 

CD 

£ 

£ 

to 

00 

CM 

£ 

CD 

OO 

6808931 

12792062 

d 

00 

CM 

o 

ID 

rv 

fv 

6694757 

CD 

£ 

fv 

CO 

CD 

O 

CD 

CD 

£ 

rv 

ID 

ID 

00 

IV 

ID 

ID 

CO 

•M- 

1  11035465 

1  4531860 

1  3022604 

I  7135899 

1  3794568 

1  3116415 

1  6714174 

1  2889792 

i  2101219 

tD 

£ 

fv 

© 

CD 

© 

CM 

© 

CD 

CM 

i  1460887 

ss 

CM 

00 

CO 

CO 

2.4081 1 

1.78981 

'O) 

to 

CM 

h- 

d 

CM 

CD 

CM 

00 

CD 

5.56541 

£ 

rr 

00 

CD 

TT 

1^ 

to 

£ 

IO 

CM 

t^ 

£ 

CD 

CO 

5.5860 

oo 

to 

CO 

CD 

CO 

CD 

to 

00 

cd 

£ 

C33 

CM 

O 

h- 

cd 

00 

CM 

CO 

cq 

cd 

£ 

CM 

CD 

cd 

CO 

£ 

to 

CM 

CD 

CM 

rv 

CM 

CM 

£ 

5m 

cd 

rv 

5 

o 

CM* 

rv 

rv 

£ 

N. 

o 

CD 

cd 

S 

£ 

CM 

o 

5 

TT 

CM 

OO 

1 

CM 

5 

CD 

rv 

ID 

CO 

CO 

CM 

TT 

5 

CM 

1.2633 

fv 

S 

d 

CD 

CM 

CO 

O 

cvi 

CM 

O 

£ 

o 

CM 

CM 

£ 

q 

CM 

00 

00 

00 

d 

s 

5S 

d 

CD 

i£i 

© 

q 

1  0.6963 

I  0.4153 

Write 

§ 

E5 

CM 

CM 

o 

to 

CD 

1 

1071504! 

CM 

C5) 

40 

oT 

s 

CD 

00 

o 

r^ 

oo 

o 

CO 

CO 

£ 

00 

CO 

00 

00 

CM 

"o' 

1 

"o 

s 

lO 

£ 

CM 

IO 

■M" 

G> 

00 

£ 

r- 

£ 

o 

Oi 

1— 

CO 

£ 

CD 

to 

00 

CM 

CD 

CM 

CO 

CD 

£ 

CM 

Tf 

C7> 

£ 

O 

CM 

CO 

CM 

CM 

CM 

N. 

£ 

CD 

CD 

to 

CO 

to 

o 

to 

rv 

o 

CD 

CO 

CM 

CO 

o 

CM 

CD 

CD 

rv 

CO 

o 

o 

CO 

CM 

CD 

£ 

CO 

CM 

O 

CD 

CM 

00 

ID 

CO 

O 

CM 

£ 

i 

CO 

O 

CM 

CD 

CO 

•cr 

Y- 

CO 

fv 

o 

o 

rv 

rv 

CD 

o 

CD 

tD 

O 

o 

CM 

CO 

tD 

00 

cd 

o 

CO 

CO 

CD 

-c 

00 

CM 

CO 

CD 

ID 

|v 

CM 

fv 

o 

CO 

s 

tT 

s 

fv 

£ 

CD 

tD 

CO 

£ 

rv 

ID 

OO 

ID 

O 

O 

tD 

CO 

CD 

CD 

Tt 

CD 

CM 

ID 

fv 

£ 

tD 

£ 

£ 

o 

i 

CD 

£ 

CD 

© 

© 

© 

CM 

© 

© 

£ 

^5 

CM 

CD 

o> 

CO 

to 

3.15751 

1.8390 

oT 

to 

to 

00 

d 

"cm 

CO 

CM 

00 

d 

d 

oo 

oo 

to 

1 

1 

*d 

£ 

CD 

CD 

n 

CO 

£ 

00 

CM 

CD 

CM 

to 

CM 

5.7152 

1 

4.6479 

CM 

£ 

cq 

Ti- 

■£ 

fe 

CD 

1 

to 

N 

o 

cd 

"d 

CO 

£ 

iv! 

CO 

CO 

00 

CM 

cd 

CO 

O 

CD 

ID 

CM* 

CM 

£ 

q 

■^* 

CO 

CM 

IV 

w 

CO 

CM* 

CM 

£ 

q 

00 

CD 

CD 

Oi 

CM 

O 

5 

CD 

CD 

CO 

CM 

't 

CM 

CM 

fv 

q 

CM 

CO 

CD 

q 

CO 

CO 

CO 

q 

CO 

CO 

CO 

q 

tD 

£ 

q 

CM 

s 

1 

CO 

tD 

CO 

CM 

oo 

o 

ID 

fv 

d 

© 

ID 

O 

tD 

CD 

CD 

© 

© 

© 

CD 

CM 

fv 

© 

© 

CM 

© 

CM 

Z 

a 

© 

© 

© 

CM 

CD 

© 

© 

© 

d 

© 

© 

CM 

CM 

£ 

9777879391 

00 

o> 

h- 

h- 

40 

CM 

CM 

59867420 

CO 

<0 

CO 

s 

40 

CO 

CM 

o 

<0 

CM 

Read 

io 

o 

oo 

w 

CD 

00 

O 

o> 

CM 

jC 

4152040 

1932512 

"cm 

o 

CD 

CD 

CO 

17133444 

14070029 

CM 

CO 

to 

CM 

To 

1 

o 

i 

o 

CM 

z 

£ 

00 

CM 

r^ 

co 

CD 

to 

N 

CO 

CD 

O) 

CD 

CD 

CO 

O 

<35 

CM 

16501205 

00 

c» 

Oi 

CO 

C35 

1 

"d 

■M- 

Oi 

00 

h- 

Oi 

15346688 

7969160 

CD 

CD 

CD 

CM 

£ 

CD 

"d 

o 

CD 

CM 

£ 

CD 

7424198 

ID 

CD 

O 

S 

ID 

CO 

CO 

CM 

CD 

ID 

1 

o> 

CD 

fv 

tD 

O 

£ 

s 

ID 

ID 

ID 

CM 

ID 

£ 

IV 

CO 

CM 

CD 

Oi 

CD 

O 

to 

£ 

TT 

rv 

fv 

CO 

fv 

fv 

£ 

CM 

CD 

£ 

to 

CD 

CM 

CO 

LO 

to 

fv 

cd 

CM 

CO 

ID 

CD 

fv 

£ 

ID 

LO 

OO 

00 

£ 

tD 

s 

CM 

O 

O 

CO 

o 

CO 

CO 

ID 

■M- 

£ 

CD 

CO 

ID 

rv 

CD 

rv 

ID 

Reference  Statistics:  1 

Total  Instruction  References 

r 

0^ 

0.9035 

eo 

o 

to 

6 

0.2660 

O) 

CM 

d 

1.5601 

1.0136 

1 

1 

"o 

8 

d 

CM 

O 

£ 

d 

O 

CO 

CO 

oq 

d 

£ 

CO 

to 

d 

CM 

CJ> 

OO 

d 

s 

£ 

d 

To 

o 

CO 

d 

£ 

IV; 

d 

rv. 

to 

to 

q 

d 

1 

CM 

rv 

CM 

CD 

d 

0.2838 

Iv 

£ 

d 

0.4004 

0.1688 

0.1074 

0.2523 

CM 

O 

d 

£ 

CO 

o 

d 

O 

£ 

d 

0.0666 

£ 

■cf 

O 

d 

CD 

fv 

o 

d 

CD 

CM 

CO 

O 

d 

CO 

CM 

O 

d 

CO 

£ 

o 

d 

£ 

o 

d 

© 

© 

O 

d 

£ 

CM 

O 

d 

o 

© 

o 

d 

fv 

CD 

O 

O 

d 

Data  Reads  I 

Data  writes 

Total  Data  References 

Total  References  1 

w 

o 

10 

CO 

,S 

« 

c 

8834314 

4976438 

2601044 

1103604 

15254790 

9911266 

8728055 

10737470 

6844904 

"cm 

to 

CM 

CD 

CD 

Oi 

to 

8633747 

5252024 

4783084 

10389626 

CO 

CM 

CO 

CD 

O 

to 

2947565 

o 

c» 

Oi 

Oi 

CO 

to 

1^ 

o 

CO 

CD 

rv. 

r^ 

£ 

1921913 

6132676 

2775044 

1580907 

3915201 

1650774 

1050456 

CD 

CD 

IV 

CD 

Si 

£ 

o 

CD 

O 

o 

£ 

CD 

ID 

£ 

1672336 

651309 

CM 

CD 

CO 

xr 

460053 

1 

1 

1 

1 

1 

Cache  1 

o 

CM 

CO 

to 

CD 

1 

“£ 

o 

II 

1 

1 

1 

1 

1 

1 

1 

1 

I 

1 

1 

1 

1 

1 

I 

1 

1 

1 

1 

1 

1 

1 

1 

1 

I 

1 

1 

177 


Table  37;  Alvinn  w/  Model,  n=l 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

c 

1 

1 

1 

£ 

£ 

CM 

5 

C33 

O 

o 

TO 

<35 

CM 

CM 

TO 

O 

TO 

TO 

CM 

TO 

05 

<35 

o 

CM 

TO 

£ 

CM 

CO 

0 

CO 

00 

£ 

CO  0 
CD  00 
^  1— 

00 

00 

00 

CO 

CO 

TO 

TO 

CO 

TO 

CO 

c\ 

CS 

|V 

CM 

CM 

£ 

CM 

0 

TO 

CO 

TO 

c 

11118508 

18081051 

O 

TO 

CM 

CM 

15934820 

11995234 

12485317 

12654887 

TO 

CM 

TO 

TO 

£ 

6614296 

6658198 

3406723 

3482814 

3486485 

19565769 

TO 

TO 

TO 

TO 

TO 

O 

CM 

TO 

TO 

TO 

£ 

CM 

TO 

£ 

TO 

O 

S 

i 

11546728 

5815387 

6037934 

05 

TO 

CM 

1 

TO 

28805474 

N. 

TO 

CM 

CM 

TO 

CM 

£ 

32416358 

16746609 

17977332 

18506006 

9820384 

10051776 

22580109 

23862549 

24419631 

12764531 

13532639 

13852639 

6960711 

7402100 

7581796 

o 

c 

60126100 

TO 

TO 

o> 

CM 

14633971 

3434711 

149865865 

126625540 

120557793 

123122982 

78268579 

72164728 

138258290 

57081656 

53380223 

101071917 

TO 

<35 

TO 

TO 

05 

O 

TO 

05 

S 

TO 

TO 

05 

TO 

C35 

TO 

h- 

TO 

TO 

TO 

TO 

TO 

TO 

52008234 

81025985 

TO 

05 

TO 

TO 

TO 

TO 

CM 

TO 

34147897 

00 

C35 

CO 

CM 

<35 

<35 

TO 

TO 

44732561 

40715479 

42992352 

23490626 

19524750 

D  CO 

"0  IV 
M-  rv 

s? 

D  <D 
35  CO 
0  T- 

TO 

rv 

IV 

00 

s 

22258474 

12498889 

TO 

0 

CM 

0 

0 

CO 

18548940 

0.2753  6108357 
0.2627  4889262 
0.5266  30612516 
0.1588  3928174 
0.1402  2422673 

1 

S 

W 

h- 

TO 

d 

0.5462 

TO 

d 

TO 

00 

TO 

CM 

CM 

1 .9495 

C35 

TO 

TO 

TO 

£ 

T- 

CO 

TO 

<35 

TO 

TO 

S 

TO 

TO 

TO 

O) 

0.8488 

0.7969 

TO 

O 

C3) 

TO 

1.4568 

1.5387 

o 

TO 

CM 

TO 

i 

d 

0.8907 

1.2170 

0.5406 

TO 

d 

1 .3285 

o 

TO 

TO 

p 

1.0249 

0.8372 

£ 

TO 

d 

0 

CO 

CO 

TO 

d 

3>  C35 

T  «0 
0  CM 
D  CO 

0  d 

TO 

C35 

CM 

d 

0.6284 

0.5096 

0.5006 

0.4388 

o 

TO 

i 

47953712 

0) 

TO 

TO 

h*. 

TO 

TO 

TO 

TO 

h- 

o> 

TO 

TO 

C3> 

1 

TO 

Si 

05 

CM 

£ 

TO 

TO 

<35 

£ 

<35 

TO 

lO 

<35 

CM 

78823053 

TO 

h. 

o 

TO 

TO 

TO 

£ 

60564534 

CM 

h. 

r- 

TO 

TO 

TO 

TO 

TO 

TO 

8 

TO 

TO 

TO 

O 

CM 

103955106 

109796884 

87816657 

£ 

o 

TO 

O 

O 

TO 

63555191 

o 

TO 

TT 

£ 

TO 

TO 

38574744 

£ 

CM 

CM 

TO 

CM 

O 

-M" 

94798817 

h- 

CM 

CO 

TO 

C55 

<35 

TO 

h- 

73132357 

CM 

CO 

CM 

CJ5 

CO 

h- 

O) 

TO 

TO 

CM 

00 

TO 

rr 

£ 

38031074 

23470337 

21060740 

<35 

CO 

CO 

CO 

TT 

■M- 

CO 

<35 

fv 

TO 

CO 

TO 

CO 

£ 

CM 

0 

CM 

|v 

TO 

CO 

CO 

CO 

TO 

CO 

<5; 

£ 

0.0135  18726638;  1.3234  205511  0.0422  18932149  0.9951  19641223 

6.0024  18441867;  1.3033  173413  0.0356  18615280  0.9785  18742142 

0.0191  35824706^  2.5318  750628  0.1540  36575334  1.9225  37573359 

0.0149  10394425;  6.7346  154789  0.0318  10549214  0.5545  11330419 

0.0014  9802574^  0.6928  127130  0.0261  9929704  0.5219  10004622 

<N 

TO 

TO 

TO 

TO 

CM 

TO 

CM 

TO 

TO 

CJ> 

d 

TO 

rs. 

<35 

TO 

00 

TO 

TO 

TO 

TO 

CM 

CD 

CM 

TO 

TO 

CM 

CD 

CM 

CM 

£ 

CO 

g 

TO 

TO* 

O 

TO 

p 

2.8081 

I 

TO 

cvi 

TO 

1^ 

TO 

to' 

5.0524 

05 

r- 

TO 

TO 

TO 

4.2674 

2.8082 

CM 

TO 

O 

CO 

TO 

TO 

O 

TO 

•M- 

TO 

CM 

TO 

TO 

TO 

TO 

3 

s 

!S. 

CM 

■M- 

co' 

<35 

£ 

5 

it 

i 

TO 

0 

CM 

0  ' 

TO  C 
<35  C 

c 

<35 

gs 

0  'T- 

vi 

§ 

s 

CO 

£ 

CM 

CM 

£ 

<0 

TO 

CM 

TO 

TO 

TO 

TO 

TO 

rv 

TO 

1 

h- 

CM 

CO 

<o 

O 

O 

O 

TO 

I 

1 

TO 

O 

TO 

a> 

o 

TO 

TO 

TO 

TO 

O 

CM 

CM 

£ 

o 

TO 

TO 

TO 

£ 

O 

TO 

TO 

CM 

<35 

CM 

TO 

CM 

TO 

TO 

TO 

CM 

TO 

119248767 

74428071 

TO 

hw 

f'. 

o 

TO 

TO 

£5 

TO 

TO 

TO 

£ 

53422230 

s 

TO 

<35 

TO 

C35 

105736009 

<35 

§ 

TO 

TO 

TO 

53423993 

58046764 

81931190 

TO 

TO 

CM 

TO 

TO 

TO 

TO 

35454309 

TO 

TO 

CM 

O 

O 

TO 

TO 

TO 

05 

CM 

TO 

TO 

CM 

71758324 

§ 

s 

TO 

TO 

TO 

38269211 

0  c 
0  ^ 

£? 
TO  0 

s 

21000395 

TO 

CO 

0 

0 

£ 

TO 

TO 

CO 

0 

CO 

'M' 

35887549 

35504161 

CM 

CM 

£ 

& 

<3> 

CM 

r 

TO 

UO 

CM 

TO 

6 

0.1839 

s 

o 

d 

CO 

£ 

o 

d 

o 

TO 

O 

d 

<35 

T- 

TO 

CM 

d 

S 

TO 

CM 

d 

0.4294 

§ 

CM 

d 

TO 

TO 

TO 

d 

TO 

CM 

TO 

d 

TO 

TO 

TO 

d 

05 

o 

05 

d 

O 

TO 

TO 

CM 

d 

0.2004 

o 

o 

CM 

d 

TO 

O 

TO 

TO 

TO 

d 

TO 

£ 

d 

£ 

TO 

TO 

d 

N. 

TO 

d 

o 

TO 

O 

d 

£ 

TO 

d 

TO 

<35 

d 

o 

CO 

o 

d 

O 

CO 

CM 

d 

CO 

h- 

00 

0 

d 

CM  0 

CO  a 

0  T 

d  c 

0  T- 

0  <35 

3  d 

CM 

CM 

s 

d 

C35 

£ 

0 

d 

TO 

TO 

TO 

p 

d 

rv 

|V 

Tf 

0 

d 

0.1185 

Write 

1585800 

I 

1137898 

2092895 

1001613 

TO 

h* 

TO 

TO 

h* 

<35 

1 

1 

TO 

CM 

Tj-’ 

TO 

i 

CM 

o> 

s 

U) 

cm’ 

CM 

CM 

N- 

CM 

TO 

TO 

TO 

d 

rv 

TO 

h.; 

00 

CM 

to' 

8.2795 

5.1891 

4.7479 

TO 

<35 

CM 

05 

§ 

CO 

3.4442 

7.8200 

6.7238 

7.4029 

5.6526 

3.7192 

4.0534 

TO 

TO 

TO 

TO 

TO 

CM 

CM 

TO 

CM 

2.4691 

6.2326 

5.0869 

CO 

CO 

£ 

TO 

o 

TO 

TO 

CO 

CO 

2.6744 

h-  « 
TO  U 

0 

TO  1- 

cvi  c 

3  <35 

3  TO 

I-  TO 
TT 

> 

0 

TO 

p 

3.0133 

£ 

TO 

cm’ 

rv 

CM 

<35 

TT 

CM 

£ 

TO 

rv 

CM 

TO 

CO 

s 

IV 

0 

CM 

rv 

TO 

<35 

CO 

<35 

CM 

5233222102 

1415013649 

487428474 

1902442123 

7135664225 

Read 

58421027 

CM 

g 

O 

o> 

TO 

s 

CM 

TO 

TO 

TO 

CM 

O 

o 

CO 

TO 

TO 

TO 

TO 

<35 

TO 

TT 

TO 

TO 

TO 

£ 

o 

£ 

117123920 

117155872 

73426458 

67182791 

131539713 

52625006 

48735719 

110654476 

95142808 

104751442 

79984777 

52627881 

TO 

TO 

TO 

TO 

TO 

TO 

r- 

TO 

TO 

i 

o 

TO 

32861270 

34937767 

88191763 

71980579 

71363806 

54562939 

36884932 

CO 

CO 

£ 

TO 

CM 

C35 

42638700 

IV 

TO 

TO 

5 

TO 

CO 

Reference  Statistics: 

CO 

<x> 

o 

£= 

c 

0,2147 

TO 

S 

d 

CM 

TO 

5 

d 

<35 

CM 

CM 

p 

d 

TO 

TO 

TO 

CM 

d 

1 

1 

TO 

<35 

d 

0.1998 

0.2053 

o 

•M- 

d 

0.1365 

0.1376 

0.1691 

0.1497 

0.0776 

0.1267 

CM 

TO 

CM 

d 

TO 

TO 

O 

d 

TO 

TO 

§ 

d 

0.0964 

i 

d 

TO 

O 

d 

TO 

TO 

TO 

p 

d 

CO 

TO 

CM 

O 

d 

TO 

00 

o 

d 

TO 

0 

d 

TO  IC 
TO  C 

0  C 

d  c 

CM 

IV 

!S 

d 

1 

1 

1.600*0 

0.0041 

0.0256 

S~ 

fl5 

cc 

c 

o 

o 

3 

w 

s 

75 

o 

Data  Reads 

Data  writes 

Total  Data  References 

Total  References 

« 

2 

c3 

Inst 

11237903 

TO 

CM 

TO 

TO 

Tf 

CM 

1^ 

TO 

CM 

h- 

TO 

TO 

r^ 

O) 

TO 

TO 

TO 

i 

TO 

a 

1 

7835423 

6632758 

£ 

CM 

TO 

O 

TO 

TO 

55 

S 

O 

TO 

TO 

TO 

CM 

O 

£ 

Tj- 

5042229 

CM 

r- 

05 

h- 

<35 

h- 

£ 

TO 

<35 

TO 

CM 

TO 

CO 

CM 

CO 

5 

CO 

CO 

o 

15: 

CO 

TO 

hv 

TO 

■M- 

s 

0 

C35 

<35 

£ 

'<r  ir 
h-  r' 
TO  C\ 
CM  ir 
TO  cv 
CO  c 
C- 

CM 

£ 

<35 

TO 

-M- 

CM 

£ 

IV 

<35 

<35 

TO 

1752307 

474244 

216043 

1340261 

709074 

126862 

TO 

CM 

0 

TO 

o> 

Oi 

781205 

CO 

£ 

■M- 

rv 

0) 

CO 

<0 

Cache 

1 

1 

1 

1 

1 

1 

1 

1 

o 

[ 

1 

1 

1 

i 

1 

a 

1 

a 

a 

a 

1 

a 

a 

1 

a 

1 

a 

1 

a 

5 

id]  TO 
cojco 

1 

j 

0 

38 

O) 

CO 

178 


709074  0.0135  1 8726638  i  1.3234 _ 205511  0.0422  18932149  0.9951  19641223  0.2753  6108357  13532639 

126862  0.0024~ _ 18441867:  1.3033  173413  0.0356  18615280  0.9785  18742142  0.2627  4889262  ^13852639 

998025  0.0191  35824706^  2.5318  750628  0.1540  36575334  1.9225  37673359  0.5266  30612516  6960711 

ZglgOS  0.0149  10394425;  0.7346  154789  0.0318  10549214  0.5545  11330419  0.1588  3928174  7402100 

74918|  0.00141  9802574^  0.69281  12713o|  0.0261  9929704  0.5219  10004622  0.1402  2422673  7581796 


Table  38:  Compress  w/  Model,  n=2 


Table  39:  GCC  w/  Model,  n=2 


<0  CM  TO  CMr-CTJOh-OCOCO-M-T-CM 

lO  <J>  h.  N.  CMCMCO<DOlOOOON.r«.r- 

1-  -r-  Tf  CO  <O'^C0lOCJ5<DCO‘^r^COC0 

CO  CO  CM  CM  O  COh-i-(DCO^Ct><D<0(7)CO 

CO  CM  CM  CO  Trr«~ococor«.co'iTOOo> 

CO  O  10  O'r-CM'ff-UiOi-CMCMCOCO 

CO  CO  CO  CO  1-  CMCMCOWCOr-CMCM^i-r- 


Ocooior^h^O'r-cMhg- 
CO  CO  CO  CO  CO  1-  CM  CM  CO  W 


CO  O  -M-  CM 
CD  N-  O  CM  CM  CO 

CO  CO  CO  CO  lO  CM 

O  CO  CO  O  O 

lO  O  CO  Tf  CD  CD 

r-  CO  O  N-  O 

T-  o  ^  1-  o 

CM  CM 


CO  CD  CM  O  CO 

CD  CM  CO  '<2;  CM 

CO  0>  lO  Ol¬ 
io  CO  Tf  O  CO 

1-  CO  r-  5  CO 

CO  C»  CM  o  h- 

1-  CM  T- 


SU)  CO  CO  o 
O)  1-  C33  CO 
CMCOCOTl-1- 
00  lo  CO  CO 

CO  O)  CM  CO  00 

CM  CM  rw  ^  CO 

-M-  CM  O  (O  1- 


i-oocoh-ij-cooc^itCDrs. 

h-oi-i-r^-M-Oi-ircocM 

CMlDOOtOC;)COCOr^COO 

coir)C3)i-cDcoco-3’CMmo 

coooor^ooi^cJ>coTj-oco 

COIOCDi-CMCMi-CMr-CRCM 
COi-^1-  lO-r*  CD1“ 


CO  o  CO  fv, 
CO  CM  CD  o 
cocoTTr^h- 
h-<3>I^C0S- 
CO  h-  CO  -g-  CO 
r-  00  ^  5 
rw  lo  cj>  ID  CO 
CM  CO  00  CD 
<3)  CM  CM  CM 


COr*.CDO)QJOCDC^'^ 
C0CMCDU3•g■C0l-^O^- 
'g^oooKj'T-ior^co^ 
IOCOCOCMCMCMU5COO 
Oi-h-or«-cocj>^co 
CMCMCOr-5;i-OCMO> 
cMh-rrcM-^r^g-ujir 
T-  CM  1-  CM  ▼- 


2  CO 

;  S  g 

©  c:  . 

I  «  CO  ^  2  ' 

^  m  CC  0)  ! 

0)  ^  C0  ©  , 

:  OC  5  Q  OC  < 

"S  « 

O  O  ; 


S-M’CMCMi-CMCOCOCMOCOOOit'OOr'.O 
h-iJ-C0C7>0>0)OC0r««.’g-C0r-Ol^l0C0 
iJ-CJ>i“Trf^CMOOOOCDCOCnCMCOC»i-CD 
CMCOCOeOO)COiS-U5C3>'»-<DCOCOh.COO>ir> 
CDCOU)COCOt-i-i-COCOCOC01jOC?)CM<0'M- 

SiococoiocDcou5i-cMh*eoTj-r^r^mcM 
IOCJ>15-COO>COCMCOCOCMIOCM1-COCM1- 


■g-coc6coi-coc3)<ococMCor«-'g-cMCO^coocor>. 

C»I^COi“COf^CMC»COCOCDCDir>^Tj-COCMCO<M^ 

cMi-TrcMi-;CMi-;qi-^qqpqqqqqqpq 

dooocJoooocooocicooocoooo 


I-I-CMCMCMCMCMCMCMCMCMCMCO'COCOCO 


180 


Table  40:  Espresso  w/  Model,  n=2 


1 

I 

1 

1 

1 

j 

i 

1 

1 

1 

I 

1 

1 

1 

00 

CM 

CO 

3513251 

1 

1 

I 

CD 

ID 

z 

s 

909087 

967043 

CM 

CO 

N 

Tf 

CT 

o: 

4549697 

5091955 

5384733 

2527368 

2801086 

1 

1 

00 

CD 

CO 

CO 

z 

s 

oo 

h- 

T-  O 
h-  CM 

In  id 

CO  CD 
CO  CM 
«r-  O 
D  CD 

6363164 

3023154 

3512496 

3722772 

1760373 

2048622 

2177421 

3201889 

3466560 

3583967 

1971659 

2138886 

CO 

CO 

s 

s; 

CM 

12067741 

1^ 

CO 

CD 

O 

CO 

1393372 

1 

i 

I 

1 

1 

1 

1 

Int(O) 

21150667 

CO 

CO 

CO 

o 

CO 

o 

o 

o 

CO 

CO 

5 

S 

CO 

CO 

S 

40760522 

27060264 

0) 

ID 

CO 

Oi 

w 

CM 

N. 

O 

CM 

CO 

K 

1^ 

CO 

CM 

CM 

s 

8 

CM 

CM 

o 

CO 

te 

s 

00 

40063746 

22880963 

18638683 

25651103 

13156604 

9869207 

CO 

CM 

CO 

ID 

ft 

CM 

CM 

CO 

o 

s 

o 

z 

7381267 

CO 

CD 

ID 

CO 

W 

CO 

CM 

10000023 

6896588 

4238284 

2295695 

10726913 

3467498 

1683180 

11116485 

3333271 

1505508 

4889018: 

11710171 

286574 

5276946 

o 

o 

co 

CD 

CD 

"oi 

z 

z 

CM 

CM 

ft 

CO 

1905639 

2800271 

o 

i 

q 

0.6495 

i 

O) 

CM 

CO 

O) 

o 

00 

'a- 

CO 

Oi 

Oi 

M- 

CM 

O 

z 

CM 

o 

CM 

CO* 

CD 

CD 

CO 

O) 

CM 

CO 

r^ 

CD 

3.2430 

1.8876 

o 

z 

CD 

S 

CD 

CO 

oi 

1 

ft 

o 

CM 

CO 

o 

o 

o 

CM 

z 

z 

ID 

CO 

s 

d 

S 

ID 

05 

h. 

CO 

d 

d 

0.6739 

i.yoiy 

0.8127 

0.6854 

CO 

00 

00 

o 

0.5525 

0.4279 

Tm 

CD 

q 

0.4260 

ID 

05 

CM 

d 

0.6404 

IN. 

CO 

CO 

d 

1 

0.5737 

0.2640 

0.1964 

0.6292 

0.2545 

0.13251 

S 

CD 
TO  S 

^  CM 

& 

CD 

CO 

CO 

ID 

CM 

CD 

O 

CM 

00 

00 

CO 

fS 

o 

Oi 

o 

s 

& 

CO 

CO 

CO 

CO 

IS 

o 

CO 

00 

Oi 

CM 

00 

CD 

§ 

CM 

s 

i 

ID 

Oi 

CO 

z 

CD 

s 

3 

CM 

CO 

Oi 

Oi 

40972897 

23848070 

Oi 

CM 

CD 

CO 

CO 

CO 

CD 

00 

CO 

o 

CM 

o 

CO 

CD 

00 

CD 

00 

s 

00 

CD 

CD 

CO 

z 

CM 

ID 

s 

CD 

CM 

CM 

ID 

CM 

z 

r- 

o 

CM 

CO 

z 

z 

o 

s 

ID 

CO 

R 

s 

z 

CO 

z 

o 

h- 

ID 

TT 

ID 

00 

1 

10268362 

o 

CO 

I 

ID 

CD 

00 

13750370 

6980328 

ID 

CM 

CO 

CO 

O 

S 

Td 

z 

IN¬ 

IN. 

CO 

CM 

5382099! 

3683157 

o 

00 

CM 

Z 

o 

CO 

CD 

CD 

IN. 

CO 

CO 

3870983 

72488431 

CO 

ID 

z 

CO 

CO 

o 

U) 

CO 

z 

s 

<D 

CO 

CO 

CD 

-M- 

CD 

h- 

o 

00 

ID 

CM 

CO 

16735021 

CO 

r^ 

5?  S 

uj 

O) 

ID 

CO 

O 

CO 

W 

f'* 

5) 

0.8971 

d 

o 

o 

o 

d 

CD 

CO 

h- 

<0 

ID 

TT 

O) 

CD* 

ID 

CM 

z 

d 

<r- 

O 

CO 

CO 

cd 

CM 

CO 

CO 

CO 

rr 

CO 

CO 

o 

CO 

CO 

ID 

Oi 

'fl¬ 

ed 

CD 

ID 

CO 

ID* 

w 

CO 

00 

cd 

CM 

CO 

CD 

•M-* 

O 

h- 

■M- 

TT 

ID 

cd 

M- 

z 

cd 

CO 

CO 

00 

cvi 

z 

h- 

■M- 

CD 

o 

z 

o 

cd 

2.3873 

H.D  1  ID 

2.8573 

2.4790 

00 

IN. 

tN 

CO 

cd 

CO 

o 

o 

o 

CM 

O 

CO 

CO 

ID 

ID 

CD 

r-v 

CO 

cd 

o 

CM 

o 

CO 

CM 

N 

q 

2.6002 

1.4383! 

1.2087 

CM 

CD 

CO 

CO 

cvi 

o 

ID 

O 

0.77651 

z 

IN. 

CO 

cvi 

"co 

CO 

-M- 

q 

CO 

'fl- 

CM 

ID 

d 

uaia 

14446038 

o 

CM 

o> 

CD 

CO 

d 

CO 

CO 

s 

s 

CO 

ID 

CM 

o 

CD 

CD 

CD 

CD 

00 

CM 

s 

z 

Oi 

z 

CM 

Oi 

CO 

<3> 

CO 

00 

Oi 

CO 

Oi 

ID 

00 

CO 

00 

CM 

CO 

CO 

R 

CM 

ID 

CM 

CD 

o 

o 

o 

Oi 

CO 

o 

o 

o 

CD 

Oi 

CM 

CM 

CO 

CO 

co 

s 

ID 

CD 

O 

ID 

CM 

CO 

5 

■M- 

CM 

CD 

0- 

ID 

VO 

CD 

CM 

S 

S 

00 

CM 

z 

00 

Z 

CD 

CO 

it 

K 

N 

00 

CD 

i 

CD 

CO 

TJ- 

s 

CO 

S 

o 

CD 

s 

ID 

CM 

ID 

CO 

CO 

CD 

00 

6819149 

1 O 1 / DV 1 V 

8161675 

7081311 

11076689 

CD 

O 

co 

ID 

R 

ID 

4464621 

11081731 

o' 

00 

o 

CD 

IN. 

ID 

'fl- 

z 

ID 

CM 

CD 

O 

CO 

74274471 

[s 

CO 

o 

34526371 

'cd' 

CO 

ID 

SI 

CO 

CO 

30014641 

CD 

TT 

z 

w 

CM 

ft 

CD 

00 

S 

CO 

IN. 

"co 

o 

o 

CD 

CD 

CM 

14984621 

Vo 

3.7426 

O 

h- 

CM 

cvi 

ID 

z 

CO 

d 

CD 

CO 

00 

cd 

O) 

CO 

h- 

ID 

CD 

i 

flO 

C» 

Oi 

rw 

ID* 

CM 

z 

CM 

CO 

CO 

ID 

CO 

cd 

rv 

CO 

CO 

ID 

ID 

CM 

CO 

CO 

CO 

cd 

O 

CO 

00 

cd 

z 

o 

CD 

TT 

CO 

z 

cd 

cd 

3.6397 

2.5734 

00 

o 

CO 

CM 

CM 

CM 

O 

CO 

CM 

cd 

2.0569 

'M-  0 
CD  C 
y-  a 

IN.  c 

C 

2.8383 

h- 

CM 

CD 

■M- 

CM* 

CO 

o 

IN. 

CO 

cvi 

I'- 

CO 

CD 

d 

5 

CO 

CM 

ID 

TT 

CM* 

"id' 

CM 

■<d' 

•M- 

s 

d 

CD 

O 

CM 

"oi 

o 

00 

CO 

CO 

ID 

CD 

IS 

■M- 

o 

q 

00 

o 

z 

d 

Id 

CM 

O 

In. 

d 

"d 

CO 

5 

0.70331 

0.42231 

CM 
O 
CD 
ID  O 
Z  W 
>  ^ 

CO 

CJ> 

O) 

CM 

ID 

CM 

CO 

W 

O 

CM 

CM 

O 

o 

CO 

rr 

CM 

CD 

z 

o> 

o 

TT 

s 

CD 

CD 

CO 

CO 

CO 

CO 

CM 

CM 

€0 

00 

00 

CM 

00 

CM 

CO 

z 

flO 

CO 

s 

CM 

h- 

Oi 

CO 

00 

CM 

3344592 

'fl- 

Oi 

00 

CD 

f'- 

CM 

1907368 

O 

h- 

CO 

00 

hv 

00 

CM 

CD 

CO 

CD 

ID 

CO 

CM 

CM 

z 

CM 

z 

CM 

o 

o 

CD 

CM 

co 

CO 

CO 

o 

s 

1 

1 

1 

1 

1 

1 

1 

'oi 

00 

Id 

IS 

CO 

CD 

§ 

Oi 

z 

CM 

ID 

z 

4205451 

923919 

421046 

2528321 

— 

c 

Oi 

ui 

ID 

3.1973 

CM 

r^ 

co 

o> 

CO 

O) 

d 

d 

<J> 

CO 

00 

d 

ID 

o 

CD 

r-’ 

CT 

d 

h- 

O 

00 

O 

ID 

CO 

CO 

cd 

o 

00 

CD* 

CJ> 

CM 

CM 

00 

cvi 

S 

CM 

r--* 

s 

R 

ID 

00 

00 

CO 

CO 

CO 

tT 

05 

CO 

O 

R 

CO 

'fl^ 

CO 

h- 

S 

<d 

ID 

z 

cd 

1 

1 

1 

1 

1 

1 

1 

1 

1 

5063930:  2.5972 

2456210“  1.0879 

17976041  0.7962 

6726978,  2.9794 

2569157!  1.1379 

977787939 

225779348 

59867420 

285646768 

1263434707 

C 

CO 

CO 

0  S 
g  s 
c  Si! 

7218927 

CO 

CD 

00 

f2 

CO 

o> 

CD 

o 

CM 

Oi 

■M- 

i 

o 

CO 

CD 

CM 

CD 

N 

'd 

CD 

o 

CO 

ID 

CO 

CO 

SI 

ID 

CM 

s 

z 

CO 

C7) 

■M" 

ID 

O 

R 

00 

o 

■M- 

ID 

cn 

CO 

CM 

Oi 

CO 

CM 

CO 

55 

CO 

CM 

00 

§ 

§ 

CM 

CM 

CM 

R 

CO 

CO 

CO 

CO 

O 

00 

<5 

ID 

O 

CO 

z 

8 

00 

O) 

CD 

ID 

w 

CD 

CO 

ID 

8004054 

CD 

CD 

CD 

s 

CO 

r*- 

ID 

s 

CM 

ID 

<0 

I 

1 

Reference  Statistics: 

CO 

o 

o 

c 

j3> 

CO 

6 

0.5139 

CM 

Q> 

CM 

C> 

CD 

CO 

CM 

d 

s' 

CD 

co" 

o 

CO 

o 

d" 

o 

Oi 

d 

r^' 

CO 

o 

s' 

o 

d 

1 

S 

00 

00 

d 

CO 

z 

d 

O 

CO 

CD 

'll- 

d 

CM 

Oi 

o 

F^' 

CO 

ID 

ID 

d 

z 

d 

N. 

00 

r- 

d 

I 

CM 

CO 

OJ 

d 

ID 

CO 

CO 

CO 

d 

CM 

s 

CM 

d 

1 

I 

I 

1 

1 

1 

1 

1 

1 

I 

1 

1 

1 

1 

1 

1 

1 

I 

S 

O 

cc 

c 

2 

o 

E 

OT 

CO 

O 

El 

Data  Reads 

Data  writes 

|Totaf  Data  Reference 

[Total  References 

o  i 

TO  _ 

8847616 

5025237 

2729928 

1207257 

15413408 

10074322 

8874359 

10821119 

6926166 

6039292 

8676897 

5292937 

4820279 

10685376 

5414347 

cm" 

CM 

s 

CO 

ID 

CO 

r'- 

o 

f'- 

h- 

s 

CO 

CO 

CO 

CO 

00 

z 

CO 

CO 

w 

hv 

o 

CO 

CO 

CM 

CM 

CO 

CM 

CM 

O 

<0 

CM 

CM  O 

S 

CD  CC 
CD  C\ 

h- 

00 

CO 

CD 

o 

t- 

CM 

1578149 

2673681 

1265019 

941704 

1795314 

910909 

_  620603, 

6638331 

529604 

418346 

424277 

333994 

263202 

298772 

224977 

1751201 

»  « 
(0 

»  c 
E  C 

0  o 

3 

CM 

CO 

CD 

CO 

CO 

o> 

o 

CM 

CO 

1 

I 

1 

1 

1 

1 

1 

II 

1 

1 

1 

IN. 

CM 

CO 

CM 

CD 

CM 

dT 

CO 

z 

CM 

CO 

CO 

CO 

z 

ID 

CO 

CO 

CO 

co 

8C 

CD 

CO 

181 


